Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Row value dependent data filtering and statistics on the filtered rows

AKB
8 - Asteroid

Suppose we have data that comes in a random times, say at 1, 5, 22, 25, 36, 41, ..... minutes. I am interested, for each row, how many entries exist for some number of minutes thereafter, e.g. say for within 15 minutes later. For the first entry timestamp (1), there is just 1 entry (at time value 5) within 15 minutes later. For the second entry (5), there are no entries (as 22 > 5 + 15). For the third entry (22), there are two, at 25 and 36 minutes, and so on. I'd like to both count the number of entries and then do other operations on the extracted sets. How might this be done, please? My data can be millions of records and pivoting is likely impossible due to memory constraints. Thx - 

3 REPLIES 3
Prometheus
12 - Quasar

@AKB There's probably an easier way to do this, but in my workflow, I brought in all the 15-minute increments of a day and connected it and the dataset of time stamps to an Append Fields tool to create a record for every time stamp and every 15-minute increment for each time stamp. Then I created a field from the date (not datetime) of the time stamp field and concatenated it with the 15-min increments. The result is a DateTime field that has every date from the time stamp field with its associated 15-min increments. Then I calculated the difference between Date (Every 15 min) and TimeStamp then filtered out everything that is not between 0 and 15 min. Finally, I grouped the dates together and counted how many records passed through the Filter tool. Please let me know if this is close to what you're looking for. 

Appends.PNG

Every 15.PNG

Diff.PNG

Filtered Diff.PNG

Count 15 min inc.PNG

AKB
8 - Asteroid

Hi Prometheus,

 

Thank you for your reply. If I understand correctly, you predefine contiguous 15 minute blocks over the whole day and then look for data within them. Is that correct? In my example case, I would have a block that starts at each entry value, e.g. at 5 minutes, and so not necessarily at 0, 15, 30, .... minutes. . My data might also continue past midnight. We could define windows for every minute of the day, though this would then create a lot of them. In essence, I'd like to have windows from 1 to 16 minutes, then 5 to 20, then 22 to 37 and so on, defined by the values of the data itself. 

 

If we use your data, with events at 0,3,5,6,9,15,21 and 24 minutes each day, then we would have 5 events after the 0 minute entry, 4 events after the 3 minute entry, 3 events after the 5 minute entry, 3 after the 6 entry, 3 after the 9 entry, 2 after the 15 entry, 1 after the 21 and 0 after the 24.

 

The difference is then a data dependent set of time windows versus a fixed set thereof. 

 

Thanks again - 

 

 

AKB
8 - Asteroid

Oh, and I'd also like to extract the values that fall within each window and then operate on them, getting for example their average value, etc. We could then say there were e.g. 4 events after the first one, with an average value of 5.6 minutes after the first, with a minimum spacing of 2 minutes between, and so on. 

 

Thanks for your interest - 

Labels