I currently have a data set which has three classifications
As a Document goes from draft to valid to invalid a new data line is created and the old one remains in the data set. This is causing issues and I therefore only want to keep the latest data line.
What I need to do is group the Document IDs and only keep the most recent record, however I cannot work out logic that doesn't disrupt the rest of my data.
Below is a sample of data to help explain the issue:
Document ID | Document Title | Status | Contact ID | Contact | Document Valid from | Document Valid to | Values |
4001 | Title A | Active | 123 | ABC | 31/08/2016 12:00 AM | 01/04/2020 12:00 AM | 500 |
4002 | Title B | Draft | 124 | ABD | 01/05/2019 12:00 AM | 01/06/2020 12:00 AM | 752 |
4003 | Tile C | Active | 125 | ABE | 06/07/2017 12:00 AM | 31/03/2023 12:00 AM | 425 |
4001 | Title A | Inactive | 123 | ABC | 02/03/2016 12:00 AM | 01/04/2021 12:00 AM | 500 |
4002 | Title B | Active | 124 | ABD | 02/04/2014 12:00 AM | 03/04/2020 12:00 AM | 752 |
4001 | Title A | Draft | 123 | ABC | 06/08/2018 12:00 AM | 06/08/2023 12:00 AM | 120 |
4004 | Title D | Draft | 126 | ABF | 01/09/2016 12:00 AM | 22/02/2022 12:00 AM | 681 |
The desired output would be:
Document ID | Document Title | Status | Contact ID | Contact | Document Valid from | Document Valid to | Values |
4003 | Tile C | Active | 125 | ABE | 06/07/2017 12:00 AM | 31/03/2023 12:00 AM | 425 |
4001 | Title A | Inactive | 123 | ABC | 02/03/2016 12:00 AM | 01/04/2021 12:00 AM | 500 |
4002 | Title B | Active | 124 | ABD | 02/04/2014 12:00 AM | 03/04/2020 12:00 AM | 752 |
4004 | Title D | Draft | 126 | ABF | 01/09/2016 12:00 AM | 22/02/2022 12:00 AM | 681 |
Any help or guidance appreciated
Solved! Go to Solution.
Hi @adamhill3,
I think this is what you're looking to achieve? Use the sample for the "First N Rows" where N = 1 and group by your Document ID column
If this solves your issue please mark the answer as correct, if not let me know! I've attached my workflow for you to download if needed.
Regards,
Jonathan
Hey @adamhill3
I would recommend first using a Formula tool with this expression to create a new field that is a date type:
DateTimeParse([Document Valid from],"%d/%m/%Y")
From there, you can use a Sort to order your data first by Document ID (Asc) then by this new date field (Desc). Finally, use a Sample tool to only keep the first N and make sure to group by Doc ID.
Hope this helps!
Thank for your input.
This does not solve the problem though as fr Document B it returns the draft document instead of the active
Hi @Kenda
Sorry to not be clear. The date columns have no baring on the desired output. It is purely the status.
Ah gotchya @adamhill3 ! No problem. Instead of sorting by date, then, in the initial formula tool, I would recommend creating a numeric ID column relating to your status with this expression:
iif([Status]="Draft",1,iif([Status]="Active",2,3))
Then sort by Doc ID (Asc) then this new ID (Desc) and use the Sample tool to take the first N and grouping by Doc ID.
So very similar to the previous approach but sorting on the numeric ID we created based on the status now instead of the date.
Hi @adamhill3,
In that case could you create an "ID number" for each status, sort on the new "ID number" for status and use the sample method?
If this solves your issue please mark the answer as correct, if not let me know! I've attached an updated workflow for you to download if needed.
Regards,
Jonathan
Apologies for the delayed response.
Thank you @Kenda and @Jonathan-Sherman . These resolve the issue for me.