Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Picking which duplicate to keep

StephMuldoon
5 - Atom

Hi everyone,

 

I need to remove duplicates of company names but I need to prioritize which duplicate to keep. For example, for the below company showing up 4 times, I need to make sure the customer domain is kept, then the VAR would be the next priority and then the Active Leads. How would I do this?

 

NameDomain
RingCentralActive Lead
RingCentralCustomer
RingCentralCustomer
RingCentralVAR
8 REPLIES 8
acastelazo
8 - Asteroid

One way to do this is to create a lookup table with your domains and a "rank" based on which ones are prioritized. You can then join it to your data using the Domain name and use a summary tool to group it by "Name" and take the "min" of the Domain Rank. You can then join it back to your lookup table to get the Domain associated with a given rank.

 

This is what your lookup table might look like:

 

Domain RankDomain
1Customer
2VAR
3Active Lead

 

Capture.JPG

estherb47
15 - Aurora
15 - Aurora

Hi @StephMuldoon 

 

Another approach, a bit similar to @acastelazo 's, is to use a priority lookup table, join in the priority field, and then Sort on the company name (ascending or descending) and the priority (ascending if you set the highest priority to 1, descending if you give it the highest number). Then use the sample tool to pull the first record only, grouping by company name.

 

image.png

 

Please let me know if this works for you. Cheers!

 

Esther

acastelazo
8 - Asteroid

I like this solution, too! According to The Periodic Table of Alteryx Tools, both the summary and sort tools are both blocking tools, which means that the workflow is paused until all of the records are processed. If the data set is larger than what your computer can hold in memory, this means the engine will have to write to the disk before moving on. So, there may not be efficiency gains with either method, but I love that there is always a different way to get to where you need to go with Alteryx!

neilgallen
12 - Quasar
We could be splitting (very fine) hairs here l, but would a find/replace be faster vs a join?

It allows for record streaming, as well as you wouldn’t drop any records that didn’t have a domain in the lookup reference. I believe it also won’t sort the customer list as well.

Now we are deep in the weeds on how the engine works, but this is where it gets fun, right?
estherb47
15 - Aurora
15 - Aurora

@neilgallen Splitting hairs, yes, but potentially a big difference with a large data set. With my tiny one, a Find/Replace took 1.7 seconds, and with the join it took 1.9 seconds.

 

@StephMuldoon I'm not coming up with a way to do this that doesn't involve a blocking tool of some sort (even Find/Replace is conditional blocking), but I'll keep thinking on it!

 

Cheers!

Esther

candry
5 - Atom

This is so frustrating. A better solution would be to add the option in the Unique or Only Unique tools to "keep first," "keep last," "keep both," or "delete both"

neilgallen
12 - Quasar

@candry Understood. You could suggest that on the designer ideas site!

 

That said, if you wanted to keep only the first or last record, a sample tool would allow for that.

IlanaPregen
5 - Atom

love you. thank you.

Labels