Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Fiasco

warrencowan
9 - Comet

Hi Eveyrone, I'm banging my head against the wall with the fuzzy match tool and need some help.

 

I'm trying to match closely misspelled search terms, and despite configuring the options endlessly, I cant get it to match things which are very close but somehow just don't cut it. 

 

In the below example output from the fuzzy process, I have several misspellings of 'american' in 'searchterm' and they're paired with one of multiple matched terms in right_searchterm. The right_demand field is a data point that indicates the number of times the term is used externally, and I'm sorting on that descending. The idea is that the correct match, will fall into the group, and will bubble to the top of each search term group in the list, allowing me to easily identify the most common used term to replace it with.

 

Whats got me stumped', is how terms that can fall so close to the intended match, just don't consistently trigger the correct match to even appear in the group in the first place, causing the match to be incorrectly highlighted.

 

My suspicion is that the syllable variance is creating different keys which miss the intended match because the syllable makeup is marginally different even though there is only 1 chrctr difference, but I haven't been able to find the right combination of settings to test or vary it.

 

fuzzy snap.PNG

 

Can anyone tell me what I'm doing wrong and how to fine tune it?

 

Any help very much appreciated.

 

best, w

2 REPLIES 2
RogerS
Alteryx
Alteryx

If I understand what you are trying to do you may not need to generate keys but the fuzzy matching requires this. for at least one field.  To get around this I created a place holder column with a value of X to generate the keys.  and then matched on the Name field with no keys crated using character Levensteain distance which will tell me how many changes will be needed to make the two strings match.  In the attached example 7 out of 8 characters match so the match score is 87.5% which rounds up to 88%. 

 

 

 

warrencowan
9 - Comet
Thanks Roger, that's a neat workaround, and thanks for attaching the example too.
Labels