Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Remove data lines with same key

adamhill3
7 - Meteor

I currently have a data set which has three classifications

 

  1. Draft
  2. Valid
  3. Invalid

As a Document goes from draft to valid to invalid a new data line is created and the old one remains in the data set. This is causing issues and I therefore only want to keep the latest data line.

 

What I need to do is group the Document IDs and only keep the most recent record, however I cannot work out logic that doesn't disrupt the rest of my data.

 

Below is a sample of data to help explain the issue:

 

Document IDDocument TitleStatusContact IDContactDocument Valid fromDocument Valid toValues
4001Title AActive123ABC31/08/2016 12:00 AM01/04/2020 12:00 AM500
4002Title BDraft124ABD01/05/2019 12:00 AM01/06/2020 12:00 AM752
4003Tile CActive125ABE06/07/2017 12:00 AM31/03/2023 12:00 AM425
4001Title AInactive123ABC02/03/2016 12:00 AM01/04/2021 12:00 AM500
4002Title BActive124ABD02/04/2014 12:00 AM03/04/2020 12:00 AM752
4001Title ADraft123ABC06/08/2018 12:00 AM06/08/2023 12:00 AM120
4004Title DDraft126ABF01/09/2016 12:00 AM22/02/2022 12:00 AM

681

 

The desired output would be:

 

Document IDDocument TitleStatusContact IDContactDocument Valid fromDocument Valid toValues
4003Tile CActive125ABE06/07/2017 12:00 AM31/03/2023 12:00 AM425
4001Title AInactive123ABC02/03/2016 12:00 AM01/04/2021 12:00 AM500
4002Title BActive124ABD02/04/2014 12:00 AM03/04/2020 12:00 AM752
4004Title DDraft126ABF01/09/2016 12:00 AM22/02/2022 12:00 AM681

 

Any help or guidance appreciated

7 REPLIES 7
Jonathan-Sherman
15 - Aurora
15 - Aurora

Hi @adamhill3,

 

I think this is what you're looking to achieve? Use the sample for the "First N Rows" where N = 1 and group by your Document ID column

 

image.png

 

If this solves your issue please mark the answer as correct, if not let me know! I've attached my workflow for you to download if needed.

 

Regards,

Jonathan

Kenda
16 - Nebula
16 - Nebula

Hey @adamhill3 

 

I would recommend first using a Formula tool with this expression to create a new field that is a date type:

DateTimeParse([Document Valid from],"%d/%m/%Y")

 

From there, you can use a Sort to order your data first by Document ID (Asc) then by this new date field (Desc). Finally, use a Sample tool to only keep the first N and make sure to group by Doc ID.

 

Hope this helps!

adamhill3
7 - Meteor

Thank for your input.

 

This does not solve the problem though as fr Document B it returns the draft document instead of the active

adamhill3
7 - Meteor

Hi @Kenda 

 

Sorry to not be clear. The date columns have no baring on the desired output. It is purely the status.

Kenda
16 - Nebula
16 - Nebula

Ah gotchya @adamhill3 ! No problem. Instead of sorting by date, then, in the initial formula tool, I would recommend creating a numeric ID column relating to your status with this expression:

 

iif([Status]="Draft",1,iif([Status]="Active",2,3))

 

 

Then sort by Doc ID (Asc) then this new ID (Desc) and use the Sample tool to take the first N and grouping by Doc ID. 

 

So very similar to the previous approach but sorting on the numeric ID we created based on the status now instead of the date.

Jonathan-Sherman
15 - Aurora
15 - Aurora

Hi @adamhill3,

 

In that case could you create an "ID number" for each status, sort on the new "ID number" for status and use the sample method?

 

image.png

 

If this solves your issue please mark the answer as correct, if not let me know! I've attached an updated workflow for you to download if needed.

 

Regards,

Jonathan

adamhill3
7 - Meteor

Apologies for the delayed response.

 

Thank you @Kenda and @Jonathan-Sherman . These resolve the issue for me.

Labels
Top Solution Authors