We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Guide needed for Fuzzy Tool Calculations

monsieursnow
5 - Atom

Hello Community

 

I'm working on matching names in 2 separate tables using the Alteryx Fuzzy Tool.

I need some guidance in terms of how the actual match scores are being calculated behind the scenes.

 

The following are some sample test names that i have:

RecordIDSourceField1
IAJohn Smith
IIADonald Trump
IIIAVladimr Putin
1BJan Smith
2BJon Smeeth
3BJon Sam
4BDuck Trump
5BVlad Power
6BDonald Trump
7BJohn Smith
8BVladimr Putin

 

 

The following are the settings i've used on the Fuzzy Tool:

 

monsieursnow_0-1665368874445.png

 

The following results are what i've obtained:

 

RecordIDNameRecordID2TestMatchScoreMatchKey
1Jan SmithIJohn Smith88AN
1Jan SmithIJohn Smith88JN
1Jan SmithIJohn Smith88SM0
1Jan SmithIJohn Smith88XMT
2Jon SmeethIJohn Smith85AN
2Jon SmeethIJohn Smith85JN
2Jon SmeethIJohn Smith85SM0
2Jon SmeethIJohn Smith85XMT
3Jon SamIJohn Smith80AN
3Jon SamIJohn Smith80JN
7John SmithIJohn Smith100AN
7John SmithIJohn Smith100JN
7John SmithIJohn Smith100SM0
7John SmithIJohn Smith100XMT
4Duck TrumpIIDonald Trump74TRMP
6Donald TrumpIIDonald Trump100TNLT
6Donald TrumpIIDonald Trump100TRMP
8Vladimr PutinIIIVladimr Putin100FLTMR
8Vladimr PutinIIIVladimr Putin100PTN

 

Need a little guidance on how these calculations work.

I've watched the tutorials available here, but it doesn't go to the detail on how the match scores are calculated.

 

1) Key Generation

Double metaphone seems to have a default max key length of 8.

I've learnt that records in which the keys don't match are discarded.

Does it generate key for each word in a string?

Might need a little help in showing some examples of how keys are generated here.

 

2)Jaro Distance vs Levenshtein Distance

For Jaro Distance, based on the formula, i can briefly understand how the scoring is done.

But for LD i need a little help, since LD is the number of edits needed to translate String1 to String2.

How is the LD score weighted then translated to a match score, such that it gives JD an equivalent basis for comparison, then allowing the tool to pick the best of both?

 

3)Word-based vs character (no spaces)

I'm trying to understand how the scoring for this works.

Does it calculate the best of scores (Jaro and LD) for each word in the same string, then take the weight depending on how many words there are?

If so, then when does the scoring for character (no spaces) come in?

 

4)Preprocess: Strip Punctuation vs Strip Punctuation before Fuzzy Tool vs Character (no spaces)

Preprocess: Strip Punctuation

Because strip punctuation takes a longer time to process using the Tool, i decided not to use the preprocess.

 

Strip Punctuation before Fuzzy Tool

I've considered using Strip Punctuation before feeding it into the Fuzzy Tool.

The results i have seems to imply that word order is not taken into consideration when i do this (can't find my old examples for some reason).

Word order is important to me, hence i didn't choose this option.

 

Character (no spaces)

Now for the last alternative; i wanted to understand better on Character (no Spaces) when using a word-based match.

Somehow this gives a better result and also takes into account word order.

I was wondering if someone might be able to better explain how this works, as an alternative to "Strip Punctuation before Fuzzy Tool"?

 

 

 

Prefer with sticking the current settings i have at the moment for simplicity of explanation.

Of course, if there is a better setting i should use, do provide your suggestions as well.

 

Many thanks on this!

2 REPLIES 2
AndrewSu
Alteryx Alumni (Retired)

@monsieursnow , please see our help documentation below which should provide some clarity on the questions you have. 

 

https://help.alteryx.com/20221/designer/fuzzy-match-edit-match-options 

 

Have you checked out our Tool Master article as well?

 

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-Fuzzy-Match/ta-p/45485

 

 

 

monsieursnow
5 - Atom

Hello Andrew

Thank you for responding.

I've read the articles you mentioned, watched the video on Fuzzy Matching and also did the interactive lesson on it, before I posted this.

But was still unable to understand it 100%.

There seems to be a proprietary matching engine within the Fuzzy Tool was my conclusion.

Labels
Top Solution Authors