Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

fuzzy match help

mst3k
11 - Bolide

The fuzzy match tool is pretty daunting, I've looked at a couple workflows from forum posts, I've looked at the built in examples in Designer, and I can't quite get this to work the way I want.

 

I want to conform values in 1 column, so I want Purge mode I believe. The values will be user IDs which may have a 1 character typo etc, so I'm using the custom match style, and generating keys for Whole Field, and the match function is Character (No Spaces): Best of Jaro & Levenshtein Distance. I don't totally understand what I'm doing with all that, but it sounds reasonable.

 

What's the difference between the match threshold in the regular tool configuration, and another match threshold in the Edit Match Options (where the Match Weight is also found)? I'm also confused that I select "Output Unmatched Records" and get fewer records than I started with? Also, I don't understand the addition of the RecordID tool (though I did it anyway) - the built in examples in Designer start with using the target field as both the match and recordID fields.

 

Here are some sample values, including special characters (one has a pipe, one has a colon). I'd like them to merge into just 2 distinct values. I tried tinkering with it a lot and couldn't get there

 

mst3k_0-1632727569663.png

 

 

2 REPLIES 2
NicholasM
Alteryx Alumni (Retired)

Hey @mst3k,

 

Take a look at the workflow I have attached that achieves what you would like to accomplish. There are a few techniques I'll try my best to briefly describe.  

 

1) They Fuzzy Match tool, regardless if you use merge or purge mode, will match every record against each other. For this reason it needs a way to determine what is a "unique entry". The best way to easily give the unique entry field is to simply add the record ID. In your case, if your UserID has 0 potential to contain exact duplicates, that could be the "Record ID" Field in the fuzzy match, however if it could contain duplicates the record ID tool beforehand would be necessary. 

 

2) The output of the Fuzzy Match tool will tell you which records matched with eath other. These could be many, Record 1 to record 2, record 2 to record 3, etc. This means that you will not simply get 2 unique records after only the fuzzy match. There will be some post processing that needs to be done. Typically, if you are trying to create a unique list of acceptable names, the "Make Groups" tool is the answer. This will create a lookup table to correct the original file with. 

 

3) The critical part of the fuzzy match with your workflow will be to strip punctuation. This can be done in advanced settings of the fuzzy match style. I used the match style Name, and it worked just fine. 

 

4) If you have a free hour, take a look at this webinar made by Nick Smith, It does a great job at describing the settings a bit more in depth. 

 

If you have any questions, please let me know. I hope this helps.  

mst3k
11 - Bolide

Wow this is great! Thanks for the explanation and I'll definitely watch that tutorial!

Labels