Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Duplicate Analysis

amartinezalq
5 - Atom

Hi All,

 

I am trying to identify duplicate records even when some values are missing. For example, 

 

IdFirstNameLastNameEmailPhone 
1JaneDoeJane.Doe@gmail.com123-345-7891
2Jane Jane.Doe@gmail.com123-345-7891
3JaneDoe 123-345-7891
4Jane  123-345-7891
5JaneSmithJane.Smith@gmail.com567-111-1111

 

Records 1, 2, 3, and 4 are the same and should be flagged. However, when I use the Unique tool and select the four fields, each record is considered unique because the blank/null value is counted. 

 

If I use the filter tool to exclude records that have blank values, then they are not evaluated for the analysis. 

 

Let me know if you have any ideas. Thank you!!

 

 

4 REPLIES 4
ShankerV
17 - Castor

Hi @amartinezalq 

 

In the sample shared, phone is present in all records.

 

If your expectation is to just flag the duplicate records 1,2,3 and 4 then use multirow formula to flag the duplicates based on phone.

 

If you are applying to the huge dataset, please check and confirm which column has same values.

 

Many thanks

Shanker V

ShankerV
17 - Castor

Hi @amartinezalq 

 

One way of doing this.

 

ShankerV_0-1684351781035.png

 

IF [Phone ]=[Row+1:Phone ]
THEN "Flagged as duplicate"
ELSEIF [Phone ]=[Row-1:Phone ]
THEN "Flagged as duplicate"
ELSE ""
ENDIF

 

ShankerV_1-1684351810056.png

 

Many thanks

Shanker V

amartinezalq
5 - Atom

Hi @ShankerV,

 

Thank you for your response. 

 

I am applying it to a large dataset in which there is no column with a consistent value present. In other words, sometimes there is no phone. Additionally, there are other columns that I need to evaluate (address, company, etc.). 

 

IdFirstNameLastNameEmailPhone MailingCity
1JaneDoeJane.Doe@gmail.com123-345-7891Minneapolis
2Jane Jane.Doe@gmail.com123-345-7891 
3JaneDoe 123-345-7891 
4Jane  123-345-7891 
5JaneSmithJane.Smith@gmail.com567-111-1111 
6  Jane.Doe@gmail.com Minneapolis

 

Records 1, 2, 3, 4, and 6 should be flagged. 

 

Any ideas?

ChrisTX
16 - Nebula
16 - Nebula

You may want to try a waterfall approach.... first try to match on name, then email, then phone, and/or a combination of fields. 

 

The records that flow out of the first waterfall are those that did not match on the first criteria.

 

Here's an example workflow under the Help menu.  Each portion of the waterfall is when the workflow takes only the Left or Right output from a Join tool.

 

ChrisTX_0-1684352596653.png

 

 

 

Chris 

Labels
Top Solution Authors