Hi All,
I am trying to identify duplicate records even when some values are missing. For example,
Id | FirstName | LastName | Phone | |
1 | Jane | Doe | Jane.Doe@gmail.com | 123-345-7891 |
2 | Jane | Jane.Doe@gmail.com | 123-345-7891 | |
3 | Jane | Doe | 123-345-7891 | |
4 | Jane | 123-345-7891 | ||
5 | Jane | Smith | Jane.Smith@gmail.com | 567-111-1111 |
Records 1, 2, 3, and 4 are the same and should be flagged. However, when I use the Unique tool and select the four fields, each record is considered unique because the blank/null value is counted.
If I use the filter tool to exclude records that have blank values, then they are not evaluated for the analysis.
Let me know if you have any ideas. Thank you!!
In the sample shared, phone is present in all records.
If your expectation is to just flag the duplicate records 1,2,3 and 4 then use multirow formula to flag the duplicates based on phone.
If you are applying to the huge dataset, please check and confirm which column has same values.
Many thanks
Shanker V
One way of doing this.
IF [Phone ]=[Row+1:Phone ]
THEN "Flagged as duplicate"
ELSEIF [Phone ]=[Row-1:Phone ]
THEN "Flagged as duplicate"
ELSE ""
ENDIF
Many thanks
Shanker V
Hi @ShankerV,
Thank you for your response.
I am applying it to a large dataset in which there is no column with a consistent value present. In other words, sometimes there is no phone. Additionally, there are other columns that I need to evaluate (address, company, etc.).
Id | FirstName | LastName | Phone | MailingCity | |
1 | Jane | Doe | Jane.Doe@gmail.com | 123-345-7891 | Minneapolis |
2 | Jane | Jane.Doe@gmail.com | 123-345-7891 | ||
3 | Jane | Doe | 123-345-7891 | ||
4 | Jane | 123-345-7891 | |||
5 | Jane | Smith | Jane.Smith@gmail.com | 567-111-1111 | |
6 | Jane.Doe@gmail.com | Minneapolis |
Records 1, 2, 3, 4, and 6 should be flagged.
Any ideas?
You may want to try a waterfall approach.... first try to match on name, then email, then phone, and/or a combination of fields.
The records that flow out of the first waterfall are those that did not match on the first criteria.
Here's an example workflow under the Help menu. Each portion of the waterfall is when the workflow takes only the Left or Right output from a Join tool.
Chris
User | Count |
---|---|
107 | |
85 | |
76 | |
54 | |
40 |