Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Matching Help

RobbleBobble
5 - Atom

Hey folks!

 

I'm having some issues with fuzzy matching - I've looked around the forums already, but fuzzy issues seem to be pretty circumstantial.

 

Most of the time it works great and the matches make sense. Sometimes, though, it will match two records for reasons I don't understand.

 

Attached is my setup for the tool. The words that I have it ignoring are:

AND OF THE CO INC HEALTH MEDICAL CENTER HEALTHCARE GENERAL MEMORIAL REGIONAL COUNTY PUBLIC LLC DIAGNOSTICS COLLEGE SOLUTIONS UNIVERSITY HOSPITAL DEPARTMENT

 

What I don't understand is why it ends up matching the attached. "Kern Valley Hospital" and "DESERT VALLY HOSPITAL" (sic) with a match score of 86.

 

Any help would be greatly appreciated, thanks!

 

 

3 REPLIES 3
ArnavS
Alteryx
Alteryx

Hi RobbleBobble,

 

It looks like you are Fuzzy Matching on Jaro (No Spaces). This will ignore spaces between words and match based on all letters available. This may not be the method you are looking to use in this instance of Fuzzy Matching. Details on match options can be found here.  

 

Additionally, while you are able to use the text pre-processing in the Fuzzy Match tool, it may help workflow performance and ability to make adjustments if these are built in as part of the workflow. 

 

Cheers,

ArnavS

echuong1
Alteryx Alumni (Retired)

Fuzzy match can be more of an art than a science. The underlying logic for keys and match function can sometimes be trial and error to see what works best with your data.

 

I added a couple of additional variations and played around with some of the functions and was able to get the logic to correctly identify the matches. See attached. 

 

You can reference the help documentation to see what each of the match functions and keys are looking for:

https://help.alteryx.com/current/designer/fuzzy-match-tool

 

Also, you may want to look at using Frequency Statistics and creating your own list instead of removing worse like "hospital" completely. This assigns lower weights to words more frequently occurring and still takes them into consideration for the match. Information on this is also in the help document.

echuong1_0-1594999422810.png

 

Hope this helps!

RobbleBobble
5 - Atom

@ArnavS Thanks for the insight! I did experimenting in the beginning and the Jaro option turned out the best results.

 

@echuong1 Agreed! Fuzzy matching is very convoluted, but certainly saves a lot of manual labor.

 

It turned out that switching from Double Metaphone to Alphanumberic as the key generation method was much better - my goal was to have the matches generate by purely how many words match and if they don't, how many letters they're off from each other. There's a ton more in the workflow that I do to massage the results, but that seems to be the best I've had so far.

 

The Frequency Statistics idea is interesting, I'll have to take a look at that.

 

Thanks everyone!

Labels