Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Differences between Word, Words & Digits, Character, and Character (No Spaces) in Fuzzy

whitkrieng
8 - Asteroid

Hi all,

I'm struggling to find some documentation between the differences between these Match Functions in the Fuzzy Match.  

 

The preset configuration for "Address" has it automatically selected for "Words & Digits: Jaro Distance". I noticed with this setting it could generate some false positives.  If I elevated the Match Threshold, I feel I would also lose out on some legitimate matches.  

 

An example of a false positive is this with address: 

 

1234 Larne Ave matches 1234 N Salem Ave 

 

Obv these two addresses from the naked eye should not match but the "Words & Digits: Jaro Distance" calls it a match with a score of 85. When I adjust it to "Character: Jaro Distance" it will remove this as a match, while retaining all the other legitimate matches that are a score of 85.  

 

Is there any documentation on how these Match Functions work with some examples? I've so far haven't been able to find any related to Alteryx. Thanks again for your help!

3 REPLIES 3
DataNath
17 - Castor

There’s no examples here but there is an explanation of how the match functions work here: https://help.alteryx.com/20221/designer/fuzzy-match-edit-match-options

whitkrieng
8 - Asteroid

Thanks for your response. I did look over the Help documentation and this is all that was stated for Word vs. Character. Nothing is really stated about what "Character" is.  It just says it is used in addition to the "Word" based. Does anyone have any idea what the function achieves and what is the preferred Match Function? 

 

whitkrieng_0-1655927581654.png

 

ArtApa
Alteryx
Alteryx

Hi @whitkrieng - Character is a single symbol. String is a list of characters. In your scenario "1234" and "Ave" jointly give you a very high matching score. I'd use pre-processing to remove "Ave" and other noise words. Also, you may use Regex to separate Digits from Words and Fuzzy match words only. Think of what would make sense and test it.

 

In the earlier provided link to the documentation you may find links to more detailed documentation like the one here: https://rosettacode.org/wiki/Jaro_similarity 

 

Please remember that Fuzzy Matching is not only a science. Since it's "fuzzy", it's also an art.

 

You may want to review the following video: https://community.alteryx.com/t5/Archived-Training/Fuzzy-Matching-Intermediate-Users/td-p/43852 

Labels