Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Unique Tool Not Identifying duplicates from two separate data inputs

jonathanl14
Alteryx Alumni (Retired)

I'm trying to identify duplicates of Company Names after combining two months of similar data. I utilized a Union tool to combine the two months of data. I've performed these steps in excel and identified 790 duplicates; however, the Unique tool is only identifying 523. Is there a way around this? Are the two separate inputs of data treated as unique fields?

 

Thanks!

8 REPLIES 8
rdoptis
11 - Bolide

After unioning, the Unique Tool will treat the combined dataset as one dataset. Any chance some of the fields have leading / trailing spaces that the Cleanse tool or other string formulas can clean up?

 

If that doesn't work, can you share an example that Excel finds and that Alteryx doesn't identify? 

jonathanl14
Alteryx Alumni (Retired)

I've attached some screenshots of the workflow I've created below. I tried cleansing the data, but received the same results. For the Hospital example below. There are multiple instances in which the Unique tool will capture only a portion of the duplicate companies, but identify the exact same company as unique. However, when performing a conditional formatting on the same data in Excel, they pop up as duplicates.

 

 

Unique Company.PNGDuplicate Company.PNG

estherb47
15 - Aurora
15 - Aurora

Hi @jonathanl14 

Based on your screenshots, the D output of the Unique tool is showing the duplicates, which is what it's intended to do. The U output will show the unique values.

I often recommend the Only Unique tool from the Crew Macros pack. It's a fabulous tool

Let me know if that helps, or how we can help troubleshoot further.

 

Cheers,

Esther

jonathanl14
Alteryx Alumni (Retired)

Thanks for the reply @estherb47! If I'm only selecting 'Company Name' as the Unique Field, shouldn't the hospital that's showing as a unique field technically be a duplicate item as well?Unique Field.PNG

estherb47
15 - Aurora
15 - Aurora

@jonathanl14 , in the logic of the Unique tool that comes with Alteryx, it doesn't. The U includes all unique items, and the D is only duplicates. So a company name that only appears once in the list will only appear in the U output.

That's why I love the Only Unique tool. The U output only has unique items, only what appears once. Everything else streams to the U output.

image.png
Cheers,

Esther

jonathanl14
Alteryx Alumni (Retired)

@estherb47  Thanks again for the response! I agree and follow what you're saying. By that logic and understanding of this tool then, if I am to only select the data for companies and perform the same Unique tool, shouldn't this hospital not show up as a unique?

 

Thanks!Unique Hospital.PNGDuplicate Hospital.PNG

estherb47
15 - Aurora
15 - Aurora

See, this is where the tool gets confusing.

If my data is A, A, B, B, B, C, D, E, and I send it through the Unique tool, the U output will list A, B, C, D, E. And the D output will list A, B, B

So basically, the U is saying list every item only once. It's like putting a Summarize tool on and choosing Group By on one field.

 

The D output lists only the extra duplicates. So if there are 3 "United Health Care", then the U will contain 1 of them (the first one), and the D will contain the other 2.

Clear as mud, right? I hope I've explained it well.

Cheers,

Esther

danilang
19 - Altair
19 - Altair

Hi @jonathanl14 

 

I think that maybe the confusion lies with the definition of "unique".  The common definition is that something is unique if there is only one of them.  The Alteryx definition of unique is "return a subset of the data such that each item appears only once in the subset". That's what the Unique tool outputs from the U anchor.  All the extras that are removed from the list are returned out the D.   

 

If you want to get the list of truly unique items from the data set, which is what the Only Unique macro does, you need to take the list from the U anchor and remove every item that exists in the D anchor.

 

WF.png

 

Dan

Labels