Alteryx Designer Desktop Discussions

MarcosC_ · ‎06-03-2025

Hello community,

Maybe this is very simple but I've looked at it for hours and losing my mind slowly so need to ask support.

What I need is to remove duplicates based on the reference to only have 1 record per reference but I need to keep the line that has comments and not just plain remove and keep the first one. If for the same reference, there are multiple lines with comments, I'd keep the one that has comments and highest age on the other column.

If multiple lines exist and no comments, I'd keep the highest age - but again, there could be duplicates there also so I need to keep only 1.

The comments can all be differently generated which also creates problems there.

Anyone fancy having a crack at it? :)

References	Counterparty Name	Latest Comment Text	Age
55207225	Jack Sparrow		0
55207225	Jack Sparrow		4
55207225	Jack Sparrow		5
55207324	Jack Sparrow		5
55207324	Jack Sparrow		5
55207324	Jack Sparrow		0
89819654	Jack Sparrow		192
90024782	Persona J		7
90024782	Persona J		12
91453297	John Smith LTD		6
91453297	John Smith LTD		14
91453297	John Smith LTD	Client signed Please verify.	39
91472861	Music House LTD	Client signed Please verify…..	6
91474819	John Smith LTD		6
91474819	John Smith LTD	Client signed Please verify.	34
91485774	John Smith LTD		6
91485774	John Smith LTD		11
91485774	John Smith LTD		14
91485774	John Smith LTD	Client signed Please verify.	45
91485774	John Smith LTD	Client signed Please verify…...	45
91485774	John Smith LTD	Client signed Please verif----	34
91521718	Jane Doe Inc.	No comment	101
91539944	ABC Corp	To be seen	684
91544027	XYZ, SA		0
91544027	ABC Corp	Client signed Please verify.	45
91544027	ABC Corp	Client signed Please verify.	13
91544027	ABC Corp	NCR	181
91544027	ABC Corp		242
91544027	ABC Corp		669

Thanks,

Marcos

abacon · ‎06-03-2025

@MarcosC_ Here is a way to do it.

Bacon

MarcosC_ · ‎06-03-2025

Hi @abacon ,

Not sure this works. Looking at your solution on the outcome, ABC Corp for example was taken with age group 669 when the right one to be taken would've been age 181 - "NCR" comment.

Not really see the logic on the filter Latest comment is not null because the False is empty for some reason and is being connected to the join tool.

Can you check?

abacon · ‎06-03-2025

@MarcosC_ Good call, turn off the replace nulls options in the data cleansing tool. I added it to remove punctuation, some of the lines were duplicates aside from multiple periods in them.

MarcosC_ · ‎06-05-2025

Hi @abacon ,

Removing that part on the data cleansing worked however, I'm running into another issue with my actual dataset.

I have in some cases 4 identical references, same client, all lines commented and the age is the exact same. So all 4 lines still appear.

Also the actual file has a lot more columns. I was trying to Join those after your workflow example but obviouly that gets messed up.

How can I fix this scenario?

abacon · ‎06-05-2025

@MarcosC_ There may be a better way but does a unique tool to with the fields that are all identical selected work? That will remove duplicates where the selected columns are equal.

Or you could try a summarize tool, grouping on the fields you need, that will summarize all the duplicates together.

Bacon

MarcosC_ · ‎06-05-2025

Yeah sorting through the Unique worked. After joining both datasets doing Unique ref + latest comments worked and cleared most duplicates.

Thanks

MarcosC_ · ‎06-09-2025

Hi @abacon ,

Continuing to run into troubles. Tried multiple unions and uniques trying to remove it but then data gets joined, duplicates remain.

Attached the file files with masked data. I need to remove the duplicates while leaving the latest one that has a comment. If the age is the same and both reference and comment is identical or similar, I should just keep one. Perhaps based on the system entry date - get latest. that will be a unique date and hour so probably can't be mirrored like these keep being.

Attached I send the data after some clean data cleaning just before the attempt of duplicate removal, and afterwards running it with your flow.

I need to keep all these collumns.

Any ideas?

abacon · ‎06-09-2025

I would look to adding a summarize tool, grouping on your relevant fields - the ones that you want to keep. Then do a max on the date field you want to take from. This will remove duplicates and only keep one record, if multiple spit out, they aren't duplicates.

Bacon

Alteryx Designer Desktop Discussions

Removing duplicates based on variable conditions

Re: Issue with “Block Until Done” and Multiple Out...

Re: Question anchor in Formula and Filter tool

Re: How to Filter out with certain value

Getting Started | Learning Workflows: Predictive T...

Re: How to Ignore Non-existent fields in the Summa...