This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I'm trying to fuzzy match thousands of company names and I have an excel of words that I would like to input into the Don't generate Keys for the following words field in the Fuzzy match tool. I would prefer not to have to write them out individually into the field.
I am matching company names. I have records such as 'ABC International Transport Services Ltd' and 'XYZ International Transport Services Ltd'.
They are different companies, but 'ABC' and 'XYZ' are a small proportion of the entire string, and hence 'International Transport Services' increases the match score to 95-97% and hence introduces false positives.
If, for example "International" and "Transport" and "Services" occurred often in the data, the frequency stats would tell the Fuzzy Matching to place less emphasis on those words. The higher the frequency of a word in the data, the less the emphasis for matching.
Also see the built-in Example under the Find Replace Tool The Find Replace tool finds instances where a string contains a lookup list value, and either replaces it or appends additional fields to the table when a match is found.
Replace table: word replacement & AND CO COMPANY CMPY COMPANY
This is really helpful, thank you, and I think it would be a solution but when I try to upload my own Word Frequency statistics yxmd file into the fuzzy match tool I get the error FileID does not match the File Header
Looking at the workflow "CollectStats.yxmd" mentioned below, it seems like a custom Word Frequency file should be named "Custom.yxdb" and saved in folder \Program Files\Alteryx\bin\RuntimeData\FuzzyMatch\
What steps did you take when you got a FileID error?
Thanks Chris. I found that when I saved the file directly from the workflow into the RuntimeData folder, I got that error, but when I saved it elsewhere and then copied into the folder, the error resolved.
I'm still struggling with this. The word frequency statistics works well I think but there are still keys being generated on words that I want not to be used as keys. I am having to look through the results and compile a list of these then paste it into the free text field in the fuzzy match field. It would be way easier if at all possible to have an input tool that fed into this - is this possible?