2022.1.1.30569 Patch Release Update

The 2022.1.1.30569 Patch/Minor release has been removed from the Download Portal due to a missing signature in some of the included files. This causes the files to not be recognized as valid files provided by Alteryx and might trigger warning messages by some 3rd party programs. If you installed the 2022.1.1.30569 release, we recommend that you uninstall it, and then install the latest 2022.1.1.30961 version.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Differences between Word, Words & Digits, Character, and Character (No Spaces) in Fuzzy

whitkrieng
7 - Meteor

Hi all,

I'm struggling to find some documentation between the differences between these Match Functions in the Fuzzy Match.  

 

The preset configuration for "Address" has it automatically selected for "Words & Digits: Jaro Distance". I noticed with this setting it could generate some false positives.  If I elevated the Match Threshold, I feel I would also lose out on some legitimate matches.  

 

An example of a false positive is this with address: 

 

1234 Larne Ave matches 1234 N Salem Ave 

 

Obv these two addresses from the naked eye should not match but the "Words & Digits: Jaro Distance" calls it a match with a score of 85. When I adjust it to "Character: Jaro Distance" it will remove this as a match, while retaining all the other legitimate matches that are a score of 85.  

 

Is there any documentation on how these Match Functions work with some examples? I've so far haven't been able to find any related to Alteryx. Thanks again for your help!

3 REPLIES 3
DataNath
12 - Quasar

There’s no examples here but there is an explanation of how the match functions work here: https://help.alteryx.com/20221/designer/fuzzy-match-edit-match-options

whitkrieng
7 - Meteor

Thanks for your response. I did look over the Help documentation and this is all that was stated for Word vs. Character. Nothing is really stated about what "Character" is.  It just says it is used in addition to the "Word" based. Does anyone have any idea what the function achieves and what is the preferred Match Function? 

 

whitkrieng_0-1655927581654.png

 

ArtApa
Alteryx
Alteryx

Hi @whitkrieng - Character is a single symbol. String is a list of characters. In your scenario "1234" and "Ave" jointly give you a very high matching score. I'd use pre-processing to remove "Ave" and other noise words. Also, you may use Regex to separate Digits from Words and Fuzzy match words only. Think of what would make sense and test it.

 

In the earlier provided link to the documentation you may find links to more detailed documentation like the one here: https://rosettacode.org/wiki/Jaro_similarity 

 

Please remember that Fuzzy Matching is not only a science. Since it's "fuzzy", it's also an art.

 

You may want to review the following video: https://community.alteryx.com/t5/Archived-Training/Fuzzy-Matching-Intermediate-Users/td-p/43852 

Labels