I currently have a data set which has three classifications * Draft * Valid * Invalid As a Document goes from draft to valid to invalid a new data line is created and the old one remains in the data set. This is causing issues and I therefore only want to keep the latest data line. What I need to do is group the Document IDs and only keep the most recent record, however I cannot work out logic that doesn't disrupt the rest of my data. Below is a sample of data to help explain the issue: Document IDDocument TitleStatusContact IDContactDocument Valid fromDocument Valid toValues4001Title AActive123ABC31/08/2016 12:00 AM01/04/2020 12:00 AM5004002Title BDraft124ABD01/05/2019 12:00 AM01/06/2020 12:00 AM7524003Tile CActive125ABE06/07/2017 12:00 AM31/03/2023 12:00 AM4254001Title AInactive123ABC02/03/2016 12:00 AM01/04/2021 12:00 AM5004002Title BActive124ABD02/04/2014 12:00 AM03/04/2020 12:00 AM7524001Title ADraft123ABC06/08/2018 12:00 AM06/08/2023 12:00 AM1204004Title DDraft126ABF01/09/2016 12:00 AM22/02/2022 12:00 AM681 The desired output would be: Document IDDocument TitleStatusContact IDContactDocument Valid fromDocument Valid toValues4003Tile CActive125ABE06/07/2017 12:00 AM31/03/2023 12:00 AM4254001Title AInactive123ABC02/03/2016 12:00 AM01/04/2021 12:00 AM5004002Title BActive124ABD02/04/2014 12:00 AM03/04/2020 12:00 AM7524004Title DDraft126ABF01/09/2016 12:00 AM22/02/2022 12:00 AM681 Any help or guidance appreciated

Remove data lines with same key

I currently have a data set which has three classifications

Draft
Valid
Invalid

As a Document goes from draft to valid to invalid a new data line is created and the old one remains in the data set. This is causing issues and I therefore only want to keep the latest data line.

What I need to do is group the Document IDs and only keep the most recent record, however I cannot work out logic that doesn't disrupt the rest of my data.

Below is a sample of data to help explain the issue:

Document ID	Document Title	Status	Contact ID	Contact	Document Valid from	Document Valid to	Values
4001	Title A	Active	123	ABC	31/08/2016 12:00 AM	01/04/2020 12:00 AM	500
4002	Title B	Draft	124	ABD	01/05/2019 12:00 AM	01/06/2020 12:00 AM	752
4003	Tile C	Active	125	ABE	06/07/2017 12:00 AM	31/03/2023 12:00 AM	425
4001	Title A	Inactive	123	ABC	02/03/2016 12:00 AM	01/04/2021 12:00 AM	500
4002	Title B	Active	124	ABD	02/04/2014 12:00 AM	03/04/2020 12:00 AM	752
4001	Title A	Draft	123	ABC	06/08/2018 12:00 AM	06/08/2023 12:00 AM	120
4004	Title D	Draft	126	ABF	01/09/2016 12:00 AM	22/02/2022 12:00 AM	681

The desired output would be:

Document ID	Document Title	Status	Contact ID	Contact	Document Valid from	Document Valid to	Values
4003	Tile C	Active	125	ABE	06/07/2017 12:00 AM	31/03/2023 12:00 AM	425
4001	Title A	Inactive	123	ABC	02/03/2016 12:00 AM	01/04/2021 12:00 AM	500
4002	Title B	Active	124	ABD	02/04/2014 12:00 AM	03/04/2020 12:00 AM	752
4004	Title D	Draft	126	ABF	01/09/2016 12:00 AM	22/02/2022 12:00 AM	681

Any help or guidance appreciated

Datasets

Accepted answers

Kenda

Ah gotchya @adamhill3 ! No problem. Instead of sorting by date, then, in the initial formula tool, I would recommend creating a numeric ID column relating to your status with this expression:

iif([Status]="Draft",1,iif([Status]="Active",2,3))

Then sort by Doc ID (Asc) then this new ID (Desc) and use the Sample tool to take the first N and grouping by Doc ID.

So very similar to the previous approach but sorting on the numeric ID we created based on the status now instead of the date.

Jonathan-Sherman

Hi @adamhill3,

In that case could you create an "ID number" for each status, sort on the new "ID number" for status and use the sample method?

If this solves your issue please mark the answer as correct, if not let me know! I've attached an updated workflow for you to download if needed.

Regards,

Jonathan

20200402 - KeepNewestRecord v2.yxmd

All comments

Jonathan-Sherman

Hi @adamhill3,

I think this is what you're looking to achieve? Use the sample for the "First N Rows" where N = 1 and group by your Document ID column

If this solves your issue please mark the answer as correct, if not let me know! I've attached my workflow for you to download if needed.

Regards,

Jonathan

20200402 - KeepNewestRecord.yxmd

Kenda

Hey @adamhill3

I would recommend first using a Formula tool with this expression to create a new field that is a date type:

DateTimeParse([Document Valid from],"%d/%m/%Y")

From there, you can use a Sort to order your data first by Document ID (Asc) then by this new date field (Desc). Finally, use a Sample tool to only keep the first N and make sure to group by Doc ID.

Hope this helps!

adamhill3

Thank for your input.

This does not solve the problem though as fr Document B it returns the draft document instead of the active

adamhill3

Hi @Kenda

Sorry to not be clear. The date columns have no baring on the desired output. It is purely the status.

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15867

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703