Alteryx Designer Desktop Discussions

dcoelho · ‎06-06-2019

Here's an example of what my data looks like (there are more fields but I only included the relevant ones):

First Name	Last Name	Primary Advisor	Gender	Home Address 1	Home City	Home Phone	Home State	Home Zip	Mobile Phone
Joan	Smith	Jones, Jimmie		Po Box 01332	Fort Myers	+1 (239) 562-2834	FL	33919
Joan	Smith		Female	PO Box 01332	Ft. Myers	+1 (239) 184-9749	FL	33919	+1 (239) 489-4700

Ideally, I'd want the output to be something like the following:

First Name	Last Name	Primary Advisor	Gender	Home Address 1	Home City	Home Phone	Home State	Home Zip	Mobile Phone
Joan	Smith	Jones, Jimmie	Female	PO Box 01332	Fort Myers	+1 (239) 562-2834	FL	33919	+1 (239) 489-4700

Basically, fill in the blanks/missing fields. If there's a conflict (both rows have data for the same field), I'm okay with just keeping the data from the first row, at least for that field.

Another issue is there is no guarantee of first + last name being unique. For example, these are the other records I have for "Joan Smith", which appears to be at least 7 unique people:

First Name	Last Name	Primary Advisor	Gender	Home Address 1	Home City	Home Phone	Home State	Home Zip	Mobile Phone
Joan	Smith	Bob Studer		1856 Flamingo Place. Apt. #3	Miami Beach	+1 305-698-0354	FL	33140
Joan	Smith	Steve Gilbert
Joan	Smith	Jones, Jimmie		Po Box 01332	Fort Myers	+1 (239) 562-2834	FL	33919
Joan	Smith		Female	PO Box 01332	Ft. Myers	+1 (239) 184-9749	FL	33919	+1 (239) 489-4700
Joan	Smith			1315 S Clewis Ct Apt 101	Tampa	+1 8132365385	FL	33629
Joan	SMITH	Jack Henry		230 ROUTE 579	HAMPTON	+1 906-854-7400	NJ	08827-4345
Joan	Smith	Porpora Paul A.		12 Lakeside Road	Mahopac	+1 (914) 369-4336	NY	10541
Joan A	Smith			One Metrotech Center	Brooklyn		NY	11201

So my question is, how would I go about doing this? I've been looking into fuzzy matching but I'm not sure it'll solve my problem completely. I'm open to any suggestions, thank you.

wdavis · ‎06-07-2019

Hi @dcoelho,

I think you can achieve this with a combination of a Fuzzy Match tool and a Formula tool. I have copied your table into the input data tool in the workflow attached.

The Fuzzy Match tool is setup in this example to look at Home Address (this can be amended to what you would prefer the fields to match on). It is also configured to 'Output unmatched records'. This means in the output of the tool you can see 6 records in two columns;

The first row is 2 record ID's which have been matched; RecordID = 3 and RecordID2 = 4 - This is the example you gave in your message
The following 5 records all have the same RecordID and RecordID2 as they have not fuzzy matched
We are missing 1 field due to this not having a Home Address (this is added back in later with a union

You then can join the data from the record IDs back in, so that you have your original data set, next to a copy of the data set from anything that has matched.

Using the Formula tool you can then add in IF statements to replace null values with anything that is populated from the matched Record ID (I have only added in Gender and Mobile Phone for this example)

I hope this helps. Let me know if you have any questions.

Regards

Will

dcoelho · ‎06-07-2019

@wdavis, thank you for your reply. It shows me how little I know about Alteryx because that workflow is more complex than anything I've tried for this solution (and it's not even that complicated of a workflow).

Unfortunately I'm not sure it's working for my full data set. Running my data through this workflow brought it from 153,000 records to a little under 12,000. At first I thought maybe I do have that many duplicates. However, running the original data through the Unique tool and selecting First Name, Last Name, and Address as the unique fields I got 137,000 unique records matching that criteria.

Looking through the output, it appears that it actually deleted all 8 Joan Smith records.

wdavis · ‎06-10-2019

Hi @dcoelho,

It might be worth having a look at some of our 'Tool Mastery' guides available on community, which are really useful when starting to use new tools, especially those outside of the Favourites toolbar. This guide on Fuzzy Matching should hopefully help with your understanding - https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-Fuzzy-Match/ta-p/45485

The Unique tool is likely to be pulling through more unique rows that are in fact not unique and could have differences due to slight spelling changes, e.g Joan Smith and joan smith would both be considered Unique rows within the tool due to the case differences. Likewise, if any of the rows are missing data for one of the 3 fields, then it will identify these as unique values.

Which fields are you currently using in your fuzzy match tool?

Regards

Will

dcoelho · ‎06-17-2019

Will,

I ended up going a different route to solve this issue, but I appreciate your assistance regardless. Thank you for your help!

vujennyfer · ‎01-20-2021

Hi dcoelho,

Do you mind sharing what route you ended up taking to solve this issue? I have the exact same issue you have illustrated above and would appreciate any insight! Thank you.

Alteryx Designer Desktop Discussions

Fuzzy Match Duplicates and then condense into one record