Dear Forum Members:
Please see my data below:
| ProductGroup | Item | DateTime | Value | Qty | 
| Grp-01 | Item001 | 18/03/2021 06:12 | 10 | 5 | 
| Grp-01 | Item001 | 18/03/2021 06:13 | 9 | 8 | 
| Grp-01 | Item001 | 18/03/2021 06:14 | 8 | 2 | 
| Grp-01 | Item001 | 18/03/2021 06:15 | 10 | 3 | 
| Grp-01 | Item001 | 18/03/2021 06:16 | 2 | 1 | 
| Grp-01 | Item001 | 18/03/2021 06:20 | 3 | 2 | 
| Grp-01 | Item002 | 18/03/2021 04:12 | 10 | 5 | 
| Grp-01 | Item002 | 18/03/2021 06:04 | 9 | 8 | 
| Grp-01 | Item002 | 18/03/2021 05:14 | 8 | 2 | 
| Grp-01 | Item002 | 18/03/2021 06:11 | 10 | 3 | 
| Grp-01 | Item002 | 18/03/2021 06:22 | 2 | 1 | 
| Grp-01 | Item002 | 18/03/2021 06:24 | 3 | 2 | 
| Grp-01 | Item003 | 18/03/2021 04:12 | 10 | 5 | 
| Grp-01 | Item003 | 18/03/2021 06:04 | 9 | 8 | 
| Grp-01 | Item003 | 18/03/2021 05:14 | 8 | 2 | 
| Grp-01 | Item003 | 18/03/2021 06:11 | 10 | 3 | 
| Grp-01 | Item003 | 18/03/2021 06:19 | 2 | 1 | 
| Grp-01 | Item003 | 18/03/2021 06:21 | 3 | 2 | 
| Grp-01 | Item004 | 18/03/2021 03:12 | 10 | 5 | 
| Grp-01 | Item004 | 18/03/2021 03:34 | 9 | 8 | 
| Grp-01 | Item004 | 18/03/2021 04:11 | 8 | 2 | 
| Grp-01 | Item004 | 18/03/2021 05:41 | 10 | 3 | 
| Grp-01 | Item004 | 18/03/2021 06:01 | 2 | 1 | 
| Grp-01 | Item004 | 18/03/2021 06:24 | 3 | 2 | 
Green and bold rows are the desired output for further analysis, below desired output from the above table:
| ProductGroup | Item | DateTime | Value | Qty | 
| Grp-01 | Item001 | 18/03/2021 06:20 | 3 | 2 | 
| Grp-01 | Item002 | 18/03/2021 06:22 | 2 | 1 | 
| Grp-01 | Item003 | 18/03/2021 06:21 | 3 | 2 | 
| Grp-01 | Item004 | 18/03/2021 06:24 | 3 | 2 | 
I have groups and each group contains 2 or 4 items, Each group items has more than one row for each day, every row is timestamped.
I need to pick up 2/4 items from each group with the closest datetime matched, if they matched more than 1 as well then all the entries (possibly group by Product Group, Item, Date) but the example above is showing one day entries for the items in a group.
The above is all one dataset.
Do You think this is possible?
thank You in advance.
Solved! Go to Solution.
Hi @Data-Nuker ,
If you convert the Datetime field into a format that Alteryx can understand as a date, then you can use a summarize tool to find the most recent timestamp for each product group and Item, and only keep those records.
Is this what you are looking for?
Cheers
Angelos
Thank You very much for your prompt response, amazingly quick response, many thanks.
I did that but Max can have two issues, I should have clarify in my original post, sorry for that:
1. it will have max date selected for each group whereas I required it for each day, this is not an issue, I can have 2 columns one for date and other for datetime and then will group by.
2. This is the most important issue using max for me, suppose the time is not closest to others for example below:
| ProductGroup | Item | DateTime | Value | Qty | 
| Grp-01 | Item001 | 18/03/2021 05:12 | 10 | 5 | 
| Grp-01 | Item001 | 18/03/2021 05:19 | 9 | 8 | 
| Grp-01 | Item001 | 18/03/2021 06:14 | 8 | 2 | 
| Grp-01 | Item001 | 18/03/2021 06:15 | 10 | 3 | 
| Grp-01 | Item001 | 18/03/2021 06:16 | 2 | 1 | 
| Grp-01 | Item001 | 18/03/2021 06:20 | 3 | 2 | 
| Grp-01 | Item002 | 18/03/2021 04:12 | 10 | 5 | 
| Grp-01 | Item002 | 18/03/2021 04:20 | 9 | 8 | 
| Grp-01 | Item002 | 18/03/2021 04:34 | 8 | 2 | 
| Grp-01 | Item002 | 18/03/2021 04:44 | 10 | 3 | 
| Grp-01 | Item002 | 18/03/2021 05:20 | 2 | 1 | 
| Grp-01 | Item002 | 18/03/2021 06:21 | 3 | 2 | 
| Grp-01 | Item003 | 18/03/2021 04:12 | 10 | 5 | 
| Grp-01 | Item003 | 18/03/2021 06:04 | 9 | 8 | 
| Grp-01 | Item003 | 18/03/2021 05:14 | 8 | 2 | 
| Grp-01 | Item003 | 18/03/2021 06:11 | 10 | 3 | 
| Grp-01 | Item003 | 18/03/2021 06:19 | 2 | 1 | 
| Grp-01 | Item003 | 18/03/2021 06:21 | 3 | 2 | 
| Grp-01 | Item004 | 18/03/2021 03:12 | 10 | 5 | 
| Grp-01 | Item004 | 18/03/2021 03:34 | 9 | 8 | 
| Grp-01 | Item004 | 18/03/2021 04:11 | 8 | 2 | 
| Grp-01 | Item004 | 18/03/2021 05:19 | 10 | 3 | 
| Grp-01 | Item004 | 18/03/2021 06:01 | 2 | 1 | 
| Grp-01 | Item004 | 18/03/2021 06:29 | 3 | 2 | 
From the above table desired output now:
| ProductGroup | Item | DateTime | Value | Qty | 
| Grp-01 | Item001 | 18/03/2021 05:19 | 9 | 8 | 
| Grp-01 | Item002 | 18/03/2021 05:20 | 2 | 1 | 
| Grp-01 | Item003 | 18/03/2021 05:14 | 8 | 2 | 
| Grp-01 | Item004 | 18/03/2021 05:19 | 10 | 3 | 
There is a possibility that an item in a group have recorded more than 1 time but have a difference in hours from others probably an hour or two hours, so using max will not give me my required results.
I hope this makes more clear?
Apologies @Data-Nuker , it's not clearer.
In the last example you provided, what's the rule that you followed to highlight those 4 rows for each Product Group and Item No?
Hi, If You check the time in the dataset for each item in a group, You will notice various entries, What I am trying to look is the closest datetime match, I have added a Max(DateTime), it tells You each of the max datetime but for item002 as there is no entries after 5:20 so I need to work a logic that will try and look closest to this datetime for other items as well, see in the below table:
| ProductGroup | Item | DateTime | Max(DateTime) | Value | Qty | 
| Grp-01 | Item001 | 18/03/2021 05:19 | 18/03/2021 06:20 | 9 | 8 | 
| Grp-01 | Item002 | 18/03/2021 05:20 | 18/03/2021 05:20 | 2 | 1 | 
| Grp-01 | Item003 | 18/03/2021 05:14 | 18/03/2021 06:21 | 8 | 2 | 
| Grp-01 | Item004 | 18/03/2021 05:19 | 18/03/2021 06:29 | 10 | 3 | 
I need somehow a way which will return for each group all items closest date time.
In summary i need to note the datetime for each item in a group which are close enough to each other probably 1-30 minutes away.
I hope this clarifies the question? let me know please, many thanks in advance
Hi @Data-Nuker I think I've got a solution here.
I find the time difference between every item combo, then find the smallest one per item combo, then find the smallest total time difference per record, then find the smallest on of those. Having found that, I can rejoin in stages to find which item combo that relates to, and finally use the Make Group tool to get back to our desired answer.
Hope that helps 🙂
Hi @Data-Nuker I just tried my solution on your original data set, and you get 2 Item002 (because there were two records the same difference in time).
If you don't mind which record is actually returned then I'd suggest a summarise tool at the end which takes the first record per ProductGroup and Item. I've attached the updated flow here
Hi @OllieClarke - Exactly what I am looking for, fantastic, it works like a charm on my sample data of 2 groups. I have now tried run another test with sample of 5 groups having more than 8K rows and the workflow took 6 minutes and 23 seconds, In my actual datasets I have more than 100K rows, is there any possibility to optimize this? if it is not possible then its not a big deal, I have to re-visit my initial dataset to see if I can reduce.
Most importantly, I have got the logic, thank You @OllieClarke
This really helps.
Many thanks
Hey @Data-Nuker, my approach is a bit tricky to optimise as it relies on exploding the data. I've tweaked the workflow a bit below which makes it a bit quicker (using find and replace instead of joins where possible, and also attempting to limit the extent of the data explosion at the start). I've also made it a bit more robust to having multiple product groups too.
I’ve been doing some testing, firstly with 2 unequal product groups (all the duplicate records in the same group). Running it on my laptop with the AMP engine turned on (I'd strongly recommend this). With 96,000 records it ran in 52 minutes, with 24,000 records it ran in 2:11 and with 12,000 records it ran in 24 seconds. 
I did try turning it into a batch macro (because then you could group by product group and keep the first join as an append), but that ran slower (28 seconds for 12,000 records and 2:18 for 24,000 records). 
The biggest problem was having lots of records within a product group - 24,000 records with 2 product groups took 5 times longer to run than 24,000 records with 13 product groups. With the batch macro approach and 13 product groups, 120,000 records ran in 4:13. The workflow approach with the same data ran in 9:49. So if your data has roughly equal size product groups, and a fair few of them, then the batch macro is the way to go.
I’ve attached everything below. Hopefully that helps a bit 🙂
Ollie
 
					
				
				
			
		

