Hello all,
I have a two Excel documents each of which contain running detailed time capture. I would like to remove anything in Document 1 (which runs from March 16 through April 3) from Document 2 (which runs from March 16 through April 10). See below data sample set:
Field 1 | Field 2 | Field 3 | Field 4 | Field 5 |
a | b | c | d | e |
a | b | c | d | e |
a | a | a | a | a |
a | a | a | a | a |
z | y | x | w | v |
a | c | e | d | b |
I would like to remove ALL items that are duplicates in ALL fields. Therefore, the output would remove the first 4 records as lines 1 and 2 and lines 3 and 4 are exact duplicates. The sample output would be:
Field 1 | Field 2 | Field 3 | Field 4 | Field 5 |
z | y | x | w | v |
a | c | e | d | b |
I have searched the community and tried everything I can think of but cannot seem to do this. The unique tool does not work exactly. I currently unioned the two datasets, used the unique filter on ALL fields and then joined them back together, whereas I thought the Left output would give me records unique to the first dataset but I currently have more records than I would expect.
Solved! Go to Solution.
Hi @rreissig001,
After you Union the files, try summarizing using GROUP BY on all fields and put a COUNT on at least one field.
Then filter for only the records where the count =1. I've posted a simplified view of it.
...thinking about it a bit more...instead of unioning you could also join file 1 and file 2 on all 5 fields... Assuming file 2 is coming in the right side, then anything that doesn't join on all 5 fields on the right output would also do this.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
5 |