Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Find latest row in date-stamped sets

SeanAdams
17 - Castor
17 - Castor

Hi all,

Many of our data sets are stamped with  date-time stamp pairs denoting validity date - so for each key (e.g. person ID) you have various updates denoted by the dates (like a slowly changing dimension table)

 

So to find the latest version of a record you could:

- Summarise to find the max-date for each personID; and then join this back to the main data set or....

- Sort by the key ascending; and by the validity date from field descending - and then use a unique tool to filter out duplicates (thinking here is that if you sort this way - then the unique tool will take the first instance of person ID; and then throw away the others)

 

However - it seems that the second one doesn't guarantee the maximum because the records start processing through the unique before the sort is finished.    Not sure if it would be guaranteed to work if we added a block component between the sort and the distinct so that the distinct is guaranteed to get the full sorted set.  

Have any of you experienced this and done any testing to confirm if the sort/unique method does guarantee the same outcome?  

 

It feels at a gut-level to be more computationally efficient (in a big-O sense but that does depend on how this is handled under the hood) and it looks neater too - but if it doesn't work then it's back to the old summarize and join  .... :-)

 

 

2 REPLIES 2
pcatterson
11 - Bolide

I don't think the Stop Until Done tool is necessary.  The result appears unchanged with and without.  Both are the same as when summarizing and rejoining.

SeanAdams
17 - Castor
17 - Castor

Thank you for the reply - just done some tests on a 180M row set and confirmed what you said (same result with and without a block until done; and exactly the same result as using a summarize with join back to the set, but about 4x faster using sort and distinct).

 

The way that you generated dates in a loop was also helpful - I've not seen this done before, but it makes sense (and has opened up my thinking about the capability of this row generator tool).

 

Appreciate it, thank you

Sean

Labels