Alteryx Designer Desktop Discussions

jdelaguila · ‎02-23-2021

Alteryx Community,

I have 50 Different files that I receive on a monthly basis that contain thousands of Name's and Address's.

Using a software it tells me all the duplicate records between all 50 Different Files based on Name and Address. The software creates a field called [DupeGroup] and every duplicate it finds it assigns a new number. So for example, If Joe Smith that lives in Washington DC is in 3 of the files it's puts a "1" in all 3 files in the [DupeGroup] field. The next duplicate bunch will get "2" and so on.

Which leaves me with something like this:

FILE#	DUPE GROUP
FILE 1	1
FILE 2	1
FILE 3	1
FILE 4	1
FILE 5	1
FILE 2	2
FILE 3	2
FILE 4	2
FILE 1	3
FILE 3	3
FILE 5	3

I'm trying to create a workflow that shows me how many times 1 file hit up against another file based on the [Dupe Group] field.

So based on the above table the end result would be this:

FILE #	FILE 1	FILE 2	FILE 3	FILE 4	FILE 5
FILE 1	0	1	2	1	2
FILE 2	1	0	2	2	1
FILE 3	2	2	0	2	2
FILE 4	1	2	2	0	1
FILE 5	2	1	2	1	0

So using File 1 and File 5 as an example, you can see it knocked up against each other 2x - Once in the [Dupe Group]=1 and again in the [Dupe Group]=3.

Any recommendations on what tool to even begin to use to get this done? I've been racking my brain over this for the last couple of days and just can't seem to find a place to start.

Javier