Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Match Duplicates and then condense into one record

dcoelho
5 - Atom

Here's an example of what my data looks like (there are more fields but I only included the relevant ones):

 

First NameLast NamePrimary AdvisorGenderHome Address 1Home CityHome PhoneHome StateHome ZipMobile Phone
JoanSmithJones, Jimmie Po Box 01332Fort Myers+1 (239) 562-2834FL33919 
JoanSmith FemalePO Box 01332Ft. Myers+1 (239) 184-9749FL33919+1 (239) 489-4700

 

Ideally, I'd want the output to be something like the following: 

 

First NameLast NamePrimary AdvisorGenderHome Address 1Home CityHome PhoneHome StateHome ZipMobile Phone
JoanSmithJones, JimmieFemalePO Box 01332Fort Myers+1 (239) 562-2834FL33919+1 (239) 489-4700

 

Basically, fill in the blanks/missing fields. If there's a conflict (both rows have data for the same field), I'm okay with just keeping the data from the first row, at least for that field. 

 

Another issue is there is no guarantee of first + last name being unique. For example, these are the other records I have for "Joan Smith", which appears to be at least 7 unique people:

 

First NameLast NamePrimary AdvisorGenderHome Address 1Home CityHome PhoneHome StateHome ZipMobile Phone
JoanSmithBob Studer 1856 Flamingo Place. Apt. #3Miami Beach+1 305-698-0354FL33140 
JoanSmithSteve Gilbert       
JoanSmithJones, Jimmie Po Box 01332Fort Myers+1 (239) 562-2834FL33919 
JoanSmith FemalePO Box 01332Ft. Myers+1 (239) 184-9749FL33919+1 (239) 489-4700
JoanSmith  1315 S Clewis Ct Apt 101Tampa+1 8132365385FL33629 
JoanSMITHJack Henry 230 ROUTE 579HAMPTON+1 906-854-7400NJ08827-4345 
JoanSmithPorpora Paul A. 12 Lakeside RoadMahopac+1 (914) 369-4336NY10541 
Joan ASmith  One Metrotech CenterBrooklyn NY11201 

 

 

So my question is, how would I go about doing this? I've been looking into fuzzy matching but I'm not sure it'll solve my problem completely. I'm open to any suggestions, thank you.

5 REPLIES 5
wdavis
Alteryx
Alteryx

Hi @dcoelho,

 

I think you can achieve this with a combination of a Fuzzy Match tool and a Formula tool. I have copied your table into the input data tool in the workflow attached.

 

The Fuzzy Match tool is setup in this example to look at Home Address (this can be amended to what you would prefer the fields to match on). It is also configured to 'Output unmatched records'. This means in the output of the tool you can see 6 records in two columns;

  • The first row is 2 record ID's which have been matched; RecordID = 3 and RecordID2 = 4 - This is the example you gave in your message
  • The following 5 records all have the same RecordID and RecordID2 as they have not fuzzy matched
  • We are missing 1 field due to this not having a Home Address (this is added back in later with a union

You then can join the data from the record IDs back in, so that you have your original data set, next to a copy of the data set from anything that has matched.

 

Using the Formula tool you can then add in IF statements to replace null values with anything that is populated from the matched Record ID (I have only added in Gender and Mobile Phone for this example)

 

I hope this helps. Let me know if you have any questions.

 

Regards

Will

dcoelho
5 - Atom

@wdavis, thank you for your reply. It shows me how little I know about Alteryx because that workflow is more complex than anything I've tried for this solution (and it's not even that complicated of a workflow).

 

Unfortunately I'm not sure it's working for my full data set. Running my data through this workflow brought it from 153,000 records to a little under 12,000. At first I thought maybe I do have that many duplicates. However, running the original data through the Unique tool and selecting First Name, Last Name, and Address as the unique fields I got 137,000 unique records matching that criteria. 

 

Looking through the output, it appears that it actually deleted all 8 Joan Smith records.

wdavis
Alteryx
Alteryx

Hi @dcoelho,

 

It might be worth having a look at some of our 'Tool Mastery' guides available on community, which are really useful when starting to use new tools, especially those outside of the Favourites toolbar. This guide on Fuzzy Matching should hopefully help with your understanding - https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-Fuzzy-Match/ta-p/45485

 

The Unique tool is likely to be pulling through more unique rows that are in fact not unique and could have differences due to slight spelling changes, e.g Joan Smith and joan smith would both be considered Unique rows within the tool due to the case differences. Likewise, if any of the rows are missing data for one of the 3 fields, then it will identify these as unique values.

 

Which fields are you currently using in your fuzzy match tool?

 

Regards

Will

 

 

 

 

dcoelho
5 - Atom

Will, 

 

I ended up going a different route to solve this issue, but I appreciate your assistance regardless. Thank you for your help!

vujennyfer
8 - Asteroid

Hi dcoelho, 

Do you mind sharing what route you ended up taking to solve this issue? I have the exact same issue you have illustrated above and would appreciate any insight! Thank you.

Labels