Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Fuzzy Match

mihir_mir_jb
8 - Asteroid

Hello All, 

 

Hope you are doing well. 

 

I have a data of company names which re tagged differently for the same company. examples in the file named Compnay cluster. What I am trying to do is use Fuzzy match tool as in the workflow that I have attached. Though the output is not exactly that I was hoping, for instance company name starting with ABC - I was hoping all the names to cluster into one name lets say ABC Co Ltd however they are tagged diffently even after the fuzzy match which I understand is due to the matching criteria. Can you'll please help me on how can I improve the accuracy. 

 

 

1 REPLY 1
Treyson
13 - Pulsar
13 - Pulsar

@mihir_mir_jb Welcome to the world of fuzzy matching! 

 

Fuzzy Matching is definitely a process that you are going to need to play around with. Since it's not an exact science, getting the best results is use case based. For example on your workflow, if you look at the settings of the fuzzy match tool, the match threshold is set to something like 85%. This is a pretty high match rate. Where it may be correct in the end, it means that matching CO to COMPANY, is going to be a low match score and not matched in this process. 

 

For fuzzy matching, there are a few things that I would recommend going into and out of the actual fuzzy match tool. 

 

1) do some of your own data scrubbing first. If you know things like "&", "Co." etc. exist in your data, you might want to do some replacing of those values. & becomes and, Co becomes Comapny, etc. 

2) Play with the match threshold. You might get down to 75% before you see things making sense.

3) Play around with the "match function". I always use "Best of Jaro and Levenshtein Distance". You may want to do some light googling into what those are, but essentially its how close phonetically and key stroke (I think) words are from each other.

4) After that looks all good, you want to look at bringing in the "Make Group" tool. It's sort of an unsung hero in the Alteryx tool set because it has a super specific purpose. Essentially, it will look at 2 columns and say okay I have a record where A = B, and another record where B = C, so A = C. That will help clear up your results.

5) Also, let's say that you want to do multiple match levels. Like you want to first do the 85% match, but then you also want to see the 75% match, you might want to do a waterfall method where you do both of these matching strategies and then flag which one they came from and decide which matches you want to keep, or rather, as you work through this solution, see where maybe it doesn't make sense to match at a certain threshold.

 

In conclusion. Fuzzy Matching is not an exact thing. You will have to play around with it until it gets as good as it can. There will be examples where you as a person know it matches, but it's not picked up. However if you have 99% confidence in your solution, you might find it's acceptable.

 

Thanks for coming to my ted talk.

Treyson Marks
Senior Analytics Engineer
Labels