Alteryx Designer Desktop Discussions

rreissig001 · ‎04-13-2020

Hello all,

I have a two Excel documents each of which contain running detailed time capture. I would like to remove anything in Document 1 (which runs from March 16 through April 3) from Document 2 (which runs from March 16 through April 10). See below data sample set:

Field 1	Field 2	Field 3	Field 4	Field 5
a	b	c	d	e
a	b	c	d	e
a	a	a	a	a
a	a	a	a	a
z	y	x	w	v
a	c	e	d	b

I would like to remove ALL items that are duplicates in ALL fields. Therefore, the output would remove the first 4 records as lines 1 and 2 and lines 3 and 4 are exact duplicates. The sample output would be:

Field 1	Field 2	Field 3	Field 4	Field 5
z	y	x	w	v
a	c	e	d	b

I have searched the community and tried everything I can think of but cannot seem to do this. The unique tool does not work exactly. I currently unioned the two datasets, used the unique filter on ALL fields and then joined them back together, whereas I thought the Left output would give me records unique to the first dataset but I currently have more records than I would expect.

ggruccio · ‎04-13-2020

Hi @rreissig001,

After you Union the files, try summarizing using GROUP BY on all fields and put a COUNT on at least one field.

Then filter for only the records where the count =1. I've posted a simplified view of it.

...thinking about it a bit more...instead of unioning you could also join file 1 and file 2 on all 5 fields... Assuming file 2 is coming in the right side, then anything that doesn't join on all 5 fields on the right output would also do this.

Alteryx Designer Desktop Discussions

Remove All Duplicate Records Based on All Fields

Re: How to select columns dynamically using number...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula