Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Closest datetime match within a group for multiple items with multiple entries

Data-Nuker
7 - Meteor

Dear Forum Members:

 

Please see my data below:

ProductGroup ItemDateTimeValueQty
Grp-01Item00118/03/2021 06:12105
Grp-01Item00118/03/2021 06:1398
Grp-01Item00118/03/2021 06:1482
Grp-01Item00118/03/2021 06:15103
Grp-01Item00118/03/2021 06:1621
Grp-01Item00118/03/2021 06:2032
Grp-01Item00218/03/2021 04:12105
Grp-01Item00218/03/2021 06:0498
Grp-01Item00218/03/2021 05:1482
Grp-01Item00218/03/2021 06:11103
Grp-01Item00218/03/2021 06:2221
Grp-01Item00218/03/2021 06:2432
Grp-01Item00318/03/2021 04:12105
Grp-01Item00318/03/2021 06:0498
Grp-01Item00318/03/2021 05:1482
Grp-01Item00318/03/2021 06:11103
Grp-01Item00318/03/2021 06:1921
Grp-01Item00318/03/2021 06:2132
Grp-01Item00418/03/2021 03:12105
Grp-01Item00418/03/2021 03:3498
Grp-01Item00418/03/2021 04:1182
Grp-01Item00418/03/2021 05:41103
Grp-01Item00418/03/2021 06:0121
Grp-01Item00418/03/2021 06:2432

 

Green and bold rows are the desired output for further analysis, below desired output from the above table:

 

ProductGroup ItemDateTimeValueQty
Grp-01Item00118/03/2021 06:2032
Grp-01Item00218/03/2021 06:2221
Grp-01Item00318/03/2021 06:2132
Grp-01Item00418/03/2021 06:2432

 

I have groups and each group contains 2 or 4 items, Each group items has more than one row for each day, every row is timestamped.

 

I need to pick up 2/4 items from each group with the closest datetime matched, if they matched more than 1 as well then all the entries (possibly group by Product Group, Item, Date) but the example above is showing one day entries for the items in a group.

 

The above is all one dataset.

 

Do You think this is possible?

 

thank You in advance.

8 REPLIES 8
AngelosPachis
16 - Nebula

Hi @Data-Nuker ,

 

If you convert the Datetime field into a format that Alteryx can understand as a date, then you can use a summarize tool to find the most recent timestamp for each product group and Item, and only keep those records.

 

AngelosPachis_0-1618565474882.png

 

Is this what you are looking for?

 

Cheers

 

Angelos

 

Data-Nuker
7 - Meteor

Hi @AngelosPachis 

 

Thank You very much for your prompt response, amazingly quick response, many thanks.

 

I did that but Max can have two issues, I should have clarify in my original post, sorry for that:

 

1. it will have max date selected for each group whereas I required it for each day, this is not an issue, I can have 2 columns one for date and other for datetime and then will group by.

 

2. This is the most important issue using max for me, suppose the time is not closest to others for example below:

 

ProductGroup ItemDateTimeValueQty
Grp-01Item00118/03/2021 05:12105
Grp-01Item00118/03/2021 05:1998
Grp-01Item00118/03/2021 06:1482
Grp-01Item00118/03/2021 06:15103
Grp-01Item00118/03/2021 06:1621
Grp-01Item00118/03/2021 06:2032
Grp-01Item00218/03/2021 04:12105
Grp-01Item00218/03/2021 04:2098
Grp-01Item00218/03/2021 04:3482
Grp-01Item00218/03/2021 04:44103
Grp-01Item00218/03/2021 05:2021
Grp-01Item00218/03/2021 06:2132
Grp-01Item00318/03/2021 04:12105
Grp-01Item00318/03/2021 06:0498
Grp-01Item00318/03/2021 05:1482
Grp-01Item00318/03/2021 06:11103
Grp-01Item00318/03/2021 06:1921
Grp-01Item00318/03/2021 06:2132
Grp-01Item00418/03/2021 03:12105
Grp-01Item00418/03/2021 03:3498
Grp-01Item00418/03/2021 04:1182
Grp-01Item00418/03/2021 05:19103
Grp-01Item00418/03/2021 06:0121
Grp-01Item00418/03/2021 06:2932

 

From the above table desired output now:

 

ProductGroup ItemDateTimeValueQty
Grp-01Item00118/03/2021 05:1998
Grp-01Item00218/03/2021 05:2021
Grp-01Item00318/03/2021 05:1482
Grp-01Item00418/03/2021 05:19103

 

There is a possibility that an item in a group have recorded more than 1 time but have a difference in hours from others probably an hour or two hours, so using max will not give me my required results.

 

I hope this makes more clear?

AngelosPachis
16 - Nebula

Apologies @Data-Nuker , it's not clearer.

 

In the last example you provided, what's the rule that you followed to highlight those 4 rows for each Product Group and Item No?

Data-Nuker
7 - Meteor

Hi, If You check the time in the dataset for each item in a group, You will notice various entries, What I am trying to look is the closest datetime match, I have added a Max(DateTime), it tells You each of the max datetime but for item002 as there is no entries after 5:20 so I need to work a logic that will try and look closest to this datetime for other items as well,  see in the below table:

 

ProductGroup ItemDateTimeMax(DateTime)ValueQty
Grp-01Item00118/03/2021 05:1918/03/2021 06:2098
Grp-01Item00218/03/2021 05:2018/03/2021 05:2021
Grp-01Item00318/03/2021 05:1418/03/2021 06:2182
Grp-01Item00418/03/2021 05:1918/03/2021 06:29103

 

I need somehow a way which will return for each group all items closest date time.

 

In summary i need to note the datetime for each item in a group which are close enough to each other probably 1-30 minutes away.

 

I hope this clarifies the question? let me know please, many thanks in advance

OllieClarke
16 - Nebula
16 - Nebula

Hi @Data-Nuker I think I've got a solution here.

I find the time difference between every item combo, then find the smallest one per item combo, then find the smallest total time difference per record, then find the smallest on of those. Having found that, I can rejoin in stages to find which item combo that relates to, and finally use the Make Group tool to get back to our desired answer.

 

Hope that helps 🙂

 

OllieClarke_0-1618571320146.png

OllieClarke_1-1618571332264.png

 

OllieClarke
16 - Nebula
16 - Nebula

Hi @Data-Nuker I just tried my solution on your original data set, and you get 2 Item002 (because there were two records the same difference in time). 

OllieClarke_0-1618571519445.png

If you don't mind which record is actually returned then I'd suggest a summarise tool at the end which takes the first record per ProductGroup and Item. I've attached the updated flow here

OllieClarke_1-1618571622962.png

 

 

Data-Nuker
7 - Meteor

Hi @OllieClarke  - Exactly what I am looking for, fantastic, it works like a charm on my sample data of 2 groups. I have now tried run another test with sample of 5 groups having more than 8K rows and the workflow took 6 minutes and 23 seconds, In my actual datasets I have more than 100K rows,  is there any possibility to optimize this? if it is not possible then its not a big deal, I have to re-visit my initial dataset to see if I can reduce.

 

Most importantly, I have got the logic, thank You @OllieClarke 

 

This really helps.

 

Many thanks

OllieClarke
16 - Nebula
16 - Nebula

Hey @Data-Nuker, my approach is a bit tricky to optimise as it relies on exploding the data. I've tweaked the workflow a bit below which makes it a bit quicker (using find and replace instead of joins where possible, and also attempting to limit the extent of the data explosion at the start). I've also made it a bit more robust to having multiple product groups too.

I’ve been doing some testing, firstly with 2 unequal product groups (all the duplicate records in the same group). Running it on my laptop with the AMP engine turned on (I'd strongly recommend this). With 96,000 records it ran in 52 minutes, with 24,000 records it ran in 2:11 and with 12,000 records it ran in 24 seconds. 

 

I did try turning it into a batch macro (because then you could group by product group and keep the first join as an append), but that ran slower (28 seconds for 12,000 records and 2:18 for 24,000 records). 

The biggest problem was having lots of records within a product group - 24,000 records with 2 product groups took 5 times longer to run than 24,000 records with 13 product groups. With the batch macro approach and 13 product groups, 120,000 records ran in 4:13. The workflow approach with the same data ran in 9:49. So if your data has roughly equal size product groups, and a fair few of them, then the batch macro is the way to go.

 

I’ve attached everything below. Hopefully that helps a bit 🙂

 

Ollie

Labels
Top Solution Authors