on 05-08-2013 07:49 AM - edited on 07-27-2021 11:49 PM by APIUserOpsDM
Hello,
Thank you for this informative article.
I would be keen to see your suggestions on how to exclude words from the fuzzy match.
I am matching company names. I have records such as 'ABC International Transport Services Ltd' and 'XYZ International Transport Services Ltd'.
They are different companies, but 'ABC' and 'XYZ' are a small proportion of the entire string, and hence 'International Transport Services' increases the match score to 95-97% and hence introduces false positives.
I have tried excluding via "Don't generate keywords for the following words" - but I am not sure if that made any improvement.
Thanks
JS
@JS_dup_135 Take a look at using Word Frequency Statistics as part of the Fuzzy Matching. Here is a reference in the Help to that topic. https://help.alteryx.com/11.0/index.htm#FuzzyEditMatchOptions.htm?Highlight="Word Frequency Statistics"
If, for example "International" and "Transport" and "Services" occurred often in the data, the frequency stats would tell the Fuzzy Matching to place less emphasis on those words. The higher the frequency of a word in the data, the less the emphasis for matching.
Andy
Sorry but another newbie fuzzy matching question - this statement from Tips above has me wondering:
3. In a Merge Fuzzy Match, usually the left side of the Match is the Master file (for example, the Experian HH file or the Info USA file). The right side is the customer file, or the file we are trying to match to the master file. Given this setup, in each of the different passes of the fuzzy match we do not send records that have a match from the left into the next pass if they have matched.
I'm seeing why the author said "usually" as I've seen this to be the case and sort of maddening. Sometimes my "master file" keys are on the left and sometimes on they are on the right. This makes for additional protective processing when matching up the fuzzy results to the ingoing data. In other words, program your logic to consider the master file keys could be on the left or the right. Is there any way to ensure the master is always on the left?
Rob
Hear hear to rdalley, it is indeed tedious to need to figure out after merge fuzzy match what is the right way of the values, in practice you need an extra join after fuzzy tool.