Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy match - more dynamic approach to "Don't Generate Keys for the following Words"?

chickenlicken
8 - Asteroid

I'm trying to fuzzy match thousands of company names and I have an excel of words that I would like to input into the Don't generate Keys for the following words field in the Fuzzy match tool.  I would prefer not to have to write them out individually into the field.

 

Do you know how I can do this?  

6 REPLIES 6
ChrisTX
15 - Aurora

Here's some text from the Comment section of this page:

 

Community > Designer > Browse Knowledge > Tips and Tricks for Fuzzy Matching
https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tips-and-Tricks-for-Fuzzy-Matching/ta-p/1230

 

how to exclude words from the fuzzy match.

I am matching company names. I have records such as 'ABC International Transport Services Ltd' and 'XYZ International Transport Services Ltd'.

They are different companies, but 'ABC' and 'XYZ' are a small proportion of the entire string, and hence 'International Transport Services' increases the match score to 95-97% and hence introduces false positives.


A: AndyM, Alteryx

Take a look at using Word Frequency Statistics as part of the Fuzzy Matching. Here is a reference in the Help to that topic. https://help.alteryx.com/11.0/index.htm#FuzzyEditMatchOptions.htm?Highlight="Word Frequency Statistics"

If, for example "International" and "Transport" and "Services" occurred often in the data, the frequency stats would tell the Fuzzy Matching to place less emphasis on those words. The higher the frequency of a word in the data, the less the emphasis for matching.

========

Also see the built-in Example under the Find Replace Tool
The Find Replace tool finds instances where a string contains a lookup list value, and either replaces it or appends additional fields to the table when a match is found.

 

Replace table:
word      replacement
&              AND
CO          COMPANY
CMPY   COMPANY


Chris

chickenlicken
8 - Asteroid

This is really helpful, thank you, and I think it would be a solution but when I try to upload my own Word Frequency statistics yxmd file into the fuzzy match tool I get the error FileID does not match the File Header

 

 

ChrisTX
15 - Aurora

Looking at the workflow "CollectStats.yxmd" mentioned below, it seems like a custom Word Frequency file should be named "Custom.yxdb" and saved in folder \Program Files\Alteryx\bin\RuntimeData\FuzzyMatch\

 

What steps did you take when you got a FileID error?


See https://help.alteryx.com/20212/designer/fuzzy-match-edit-match-options

 

Word Frequency Statistics Location

 

Word Frequency Statistics are contained within Alteryx Database files *yxdb and can be located in the RunTime Data Directory:
  \Program Files\Alteryx\bin\RuntimeData\FuzzyMatch\

 

You can also create your own Word Frequency Statistics by editing the workflow CollectStats.yxmd located in the same directory.

 

chickenlicken
8 - Asteroid

Thanks Chris.  I found that when I saved the file directly from the workflow into the RuntimeData folder, I got that error, but when I saved it elsewhere and then copied into the folder, the error resolved.

 

chickenlicken
8 - Asteroid

I'm still struggling with this.  The word frequency statistics works well I think but there are still keys being generated on words that I want not to be used as keys.  I am having to look through the results and compile a list of these then paste it into the free text field in the fuzzy match field.  It would be way easier if at all possible to have an input tool that fed into this - is this possible?

ChrisTX
15 - Aurora

Check the option for Generate keys for each word.

 

And at the bottom left, check the option for Output Generated Keys.

 

That will show you the words that are being used in the Key match portion.

Labels