Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Using Fuzzy Matching

RohanShah
5 - Atom

Hi Folks,

I have been trying to Use the fuzzy match tool in Altryx. I am trying to find similar names from two different sources.For the case of simplicity i am trying to find the match score between two names. Eg "Kath" and "Katheline" but I am unable to generate the match score for it. I am attaching the workflow for the reference.
So would be really thankful if some can help with this.

Thanks
Rohan 

4 REPLIES 4
CharlieS
17 - Castor
17 - Castor

I think what's throwing you off is that your test records are not matching based on your Match Style/settings. Your module is returning null match scores because they did not match but you have the "Output Unmatched Records" checkbox selected. If you edit the forename value in Text Input (23) to 'Kathlen', you'll see that the rest of the module operates as expected. 

 

Someone with more experience in fuzzy matching styles could chime in with some tips on that subject, but I hope that at least helps with module operation. 

DultonM
11 - Bolide

Hi @RohanShah!

 

To get started with Fuzzy Matching, I recommend watching some of the Alteryx training videos. Here is a link to one that helped me.

 

In the video, you will learn that the Fuzzy Match tool generates "Match Keys" for each record for every field you are matching on. This initial step acts like a filter so fewer pairs of records are being processed through the more complicated fuzzy matching algorithms. If none of the keys associated with a record match a key of another record, then the record is excluded from further matching and no match score is produced. In your output you can see that RecordID 100 ("Kathl") produces 2 keys: "K0L" and "KTL". RecordID 1 ("Kathleen") produces 2 different keys: "K0LN" and KTLN". No keys in common = no match.

 

Now lets pretend that they keys do match. The records still may come through as UNmatched pairs if the algorithm you chose (Names w/ Nicknames in your case) scored them below the thresholds. There are 2 types of thresholds: the overall threshold you see on the main fuzzy match tool configuration screen (the value you changed to 20%) and the individual field threshold (which you can see at the bottom of the window that pops up when you click the "Edit..." button in the Match Field section). In order for a pair of records with a matching key to output from the Fuzzy Match tool with a score, it has to exceed all the individual field thresholds AND the overall threshold (which is a weighted average of the field match scores).

 

Hopefully all that in conjunction with the video makes sense. The key takeaway is that the Fuzzy Match tool intentionally doesn't show you scores for data that doesn't match very well. If your goal is to see a match score on all your data (good or bad), my recommendations would be:

  • Use the "Edit..." option to turn off key generation. (Note that at least 1 field has to have key generation)
  • Use the "Edit..." option to lower the individual field thresholds to 0% and change the overall field threshold to 0%

Note that doing the above recommendations will slow down your Fuzzy Matching process if you are working with a lot of data. I hope this helps!

RohanShah
5 - Atom

Thank you CharlieS for your response

RohanShah
5 - Atom

Thanks alot DultonM for explaining the fuzzy tool and sharing the video link. This will surely help me to tune some of the parameters and get the best out of it.
And yes initially i am looking to obtain match score irrespective of good or bad, so your below recommendation will surely help me.
Thanks 

Labels