We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Removing duplicates based on variable conditions

MarcosC_
7 - Meteor

Hello community,

 

Maybe this is very simple but I've looked at it for hours and losing my mind slowly so need to ask support.

 

What I need is to remove duplicates based on the reference to only have 1 record per reference but I need to keep the line that has comments and not just plain remove and keep the first one. If for the same reference, there are multiple lines with comments, I'd keep the one that has comments and highest age on the other column.

 

If multiple lines exist and no comments, I'd keep the highest age - but again, there could be duplicates there also so I need to keep only 1.

The comments can all be differently generated which also creates problems there.

 

Anyone fancy having a crack at it? :)

 

ReferencesCounterparty NameLatest Comment TextAge
55207225Jack Sparrow 0
55207225Jack Sparrow 4
55207225Jack Sparrow 5
55207324Jack Sparrow 5
55207324Jack Sparrow 5
55207324Jack Sparrow 0
89819654Jack Sparrow 192
90024782Persona J 7
90024782Persona J 12
91453297John Smith LTD 6
91453297John Smith LTD 14
91453297John Smith LTDClient signed Please verify.39
91472861Music House LTDClient signed Please verify…..6
91474819John Smith LTD 6
91474819John Smith LTDClient signed Please verify.34
91485774John Smith LTD 6
91485774John Smith LTD 11
91485774John Smith LTD 14
91485774John Smith LTDClient signed Please verify.45
91485774John Smith LTDClient signed Please verify…...45
91485774John Smith LTDClient signed Please verif----34
91521718Jane Doe Inc.No comment101
91539944ABC CorpTo be seen684
91544027XYZ, SA 0
91544027ABC CorpClient signed Please verify.45
91544027ABC CorpClient signed Please verify.13
91544027ABC CorpNCR181
91544027ABC Corp 242
91544027ABC Corp 669

 

Thanks,

Marcos

8 REPLIES 8
abacon
12 - Quasar

@MarcosC_ Here is a way to do it.

 

Bacon

MarcosC_
7 - Meteor

Hi @abacon ,

 

Not sure this works. Looking at your solution on the outcome, ABC Corp for example was taken with age group 669 when the right one to be taken would've been age 181 - "NCR" comment.

 

Not really see the logic on the filter Latest comment is not null because the False is empty for some reason and is being connected to the join tool.

 

Can you check?

abacon
12 - Quasar

@MarcosC_ Good call, turn off the replace nulls options in the data cleansing tool. I added it to remove punctuation, some of the lines were duplicates aside from multiple periods in them.image.png

MarcosC_
7 - Meteor

Hi @abacon ,

Removing that part on the data cleansing worked however, I'm running into another issue with my actual dataset.

 

I have in some cases 4 identical references, same client, all lines commented and the age is the exact same. So all 4 lines still appear.

 

Also the actual file has a lot more columns. I was trying to Join those after your workflow example but obviouly that gets messed up.

 

How can I fix this scenario?

abacon
12 - Quasar

@MarcosC_ There may be a better way but does a unique tool to with the fields that are all identical selected work? That will remove duplicates where the selected columns are equal.


Or you could try a summarize tool, grouping on the fields you need, that will summarize all the duplicates together.

 

Bacon

MarcosC_
7 - Meteor

Yeah sorting through the Unique worked. After joining both datasets doing Unique ref + latest comments worked and cleared most duplicates.

Thanks

MarcosC_
7 - Meteor

Hi @abacon ,

 

Continuing to run into troubles. Tried multiple unions and uniques trying to remove it but then data gets joined, duplicates remain.

Attached the file files with masked data. I need to remove the duplicates while leaving the latest one that has a comment. If the age is the same and both reference and comment is identical or similar, I should just keep one. Perhaps based on the system entry date - get latest. that will be a unique date and hour so probably can't be mirrored like these keep being.

 

Attached I send the data after some clean data cleaning just before the attempt of duplicate removal, and afterwards running it with your flow.

I need to keep all these collumns.

 

Any ideas?

abacon
12 - Quasar

I would look to adding a summarize tool, grouping on your relevant fields - the ones that you want to keep. Then do a max on the date field you want to take from. This will remove duplicates and only keep one record, if multiple spit out, they aren't duplicates.

 

Bacon

Labels
Top Solution Authors