Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Remove All Duplicate Records Based on All Fields

rreissig001
5 - Atom

Hello all,

 

I have a two Excel documents each of which contain running detailed time capture.  I would like to remove anything in Document 1 (which runs from March 16 through April 3) from Document 2 (which runs from March 16 through April 10).  See below data sample set:

 

Field 1Field 2Field 3Field 4Field 5
abcde
abcde
aaaaa
aaaaa
zyxwv
acedb

 

I would like to remove ALL items that are duplicates in ALL fields.  Therefore, the output would remove the first 4 records as lines 1 and 2 and lines 3 and 4 are exact duplicates.  The sample output would be:

 

Field 1Field 2Field 3Field 4Field 5
zyxwv
acedb

 

I have searched the community and tried everything I can think of but cannot seem to do this.  The unique tool does not work exactly.  I currently unioned the two datasets, used the unique filter on ALL fields and then joined them back together, whereas I thought the Left output would give me records unique to the first dataset but I currently have more records than I would expect.

 
 
1 REPLY 1
ggruccio
ACE Emeritus
ACE Emeritus

Hi @rreissig001,

 

After you Union the files, try summarizing using GROUP BY on all fields and put a COUNT on at least one field.

 

Then filter for only the records where the count =1.  I've posted a simplified view of it.

 

ggruccio_2-1586787243796.png

 

ggruccio_3-1586787295375.png

 

 

 

ggruccio_1-1586787217128.png

 

ggruccio_0-1586787179386.png

 

...thinking about it a bit more...instead of unioning you could also join file 1 and file 2 on all 5 fields...  Assuming file 2 is coming in the right side, then anything that doesn't join on all 5 fields on the right output would also do this.

 

Labels
Top Solution Authors