Alteryx Designer Desktop Discussions

adamhill3 · ‎04-02-2020

I currently have a data set which has three classifications

Draft
Valid
Invalid

As a Document goes from draft to valid to invalid a new data line is created and the old one remains in the data set. This is causing issues and I therefore only want to keep the latest data line.

What I need to do is group the Document IDs and only keep the most recent record, however I cannot work out logic that doesn't disrupt the rest of my data.

Below is a sample of data to help explain the issue:

Document ID	Document Title	Status	Contact ID	Contact	Document Valid from	Document Valid to	Values
4001	Title A	Active	123	ABC	31/08/2016 12:00 AM	01/04/2020 12:00 AM	500
4002	Title B	Draft	124	ABD	01/05/2019 12:00 AM	01/06/2020 12:00 AM	752
4003	Tile C	Active	125	ABE	06/07/2017 12:00 AM	31/03/2023 12:00 AM	425
4001	Title A	Inactive	123	ABC	02/03/2016 12:00 AM	01/04/2021 12:00 AM	500
4002	Title B	Active	124	ABD	02/04/2014 12:00 AM	03/04/2020 12:00 AM	752
4001	Title A	Draft	123	ABC	06/08/2018 12:00 AM	06/08/2023 12:00 AM	120
4004	Title D	Draft	126	ABF	01/09/2016 12:00 AM	22/02/2022 12:00 AM	681

The desired output would be:

Document ID	Document Title	Status	Contact ID	Contact	Document Valid from	Document Valid to	Values
4003	Tile C	Active	125	ABE	06/07/2017 12:00 AM	31/03/2023 12:00 AM	425
4001	Title A	Inactive	123	ABC	02/03/2016 12:00 AM	01/04/2021 12:00 AM	500
4002	Title B	Active	124	ABD	02/04/2014 12:00 AM	03/04/2020 12:00 AM	752
4004	Title D	Draft	126	ABF	01/09/2016 12:00 AM	22/02/2022 12:00 AM	681

Any help or guidance appreciated

Jonathan-Sherman · ‎04-02-2020

Hi @adamhill3,

I think this is what you're looking to achieve? Use the sample for the "First N Rows" where N = 1 and group by your Document ID column

If this solves your issue please mark the answer as correct, if not let me know! I've attached my workflow for you to download if needed.

Regards,

Jonathan

Kenda · ‎04-02-2020

Hey @adamhill3

I would recommend first using a Formula tool with this expression to create a new field that is a date type:

DateTimeParse([Document Valid from],"%d/%m/%Y")

From there, you can use a Sort to order your data first by Document ID (Asc) then by this new date field (Desc). Finally, use a Sample tool to only keep the first N and make sure to group by Doc ID.

Hope this helps!

adamhill3 · ‎04-02-2020

Thank for your input.

This does not solve the problem though as fr Document B it returns the draft document instead of the active

adamhill3 · ‎04-02-2020

Hi @Kenda

Sorry to not be clear. The date columns have no baring on the desired output. It is purely the status.

Kenda · ‎04-02-2020

Ah gotchya @adamhill3 ! No problem. Instead of sorting by date, then, in the initial formula tool, I would recommend creating a numeric ID column relating to your status with this expression:

iif([Status]="Draft",1,iif([Status]="Active",2,3))

Then sort by Doc ID (Asc) then this new ID (Desc) and use the Sample tool to take the first N and grouping by Doc ID.

So very similar to the previous approach but sorting on the numeric ID we created based on the status now instead of the date.

Jonathan-Sherman · ‎04-02-2020

Hi @adamhill3,

In that case could you create an "ID number" for each status, sort on the new "ID number" for status and use the sample method?

If this solves your issue please mark the answer as correct, if not let me know! I've attached an updated workflow for you to download if needed.

Regards,

Jonathan

adamhill3 · ‎04-14-2020

Apologies for the delayed response.

Thank you @Kenda and @Jonathan-Sherman . These resolve the issue for me.

Alteryx Designer Desktop Discussions

Remove data lines with same key

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...