This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Any fields that you will be Fuzzy Matching on that relate to address information should be run through the CASS tool in order to standardize the record values. The original address fields should be deselected from the data stream via a Select tool so as to prevent confusion and to minimize processing time (recall optimization tips…).
When merging two data files (Merge Fuzzy Match process), use a join to remove any exact matches from the fuzzy match process. Along the same line of thought, you can use the Unique tool to remove any exact matches if before entering your Purge Fuzzy Match process.
In a Merge Fuzzy Match, usually the left side of the Match is the Master file (for example, the Experian HH file or the Info USA file). The right side is the customer file, or the file we are trying to match to the master file. Given this setup, in each of the different passes of the fuzzy match we do not send records that have a match from the left into the next pass if they have matched.
The logic of the fuzzy match is that one or more things will be considered to be the strong thing or the thing we are most confident about. The other stuff will be the things we are going to isolate via fuzzy logic in order to find a match. Example would be Address and ZIP for the strong piece and First and Last name as the fuzzy match piece.
Match Thresholds and Weights:
For the matching process occurring within the entire scope of a Fuzzy Match tool we define the Total Match Threshold (the final score).
For each field we are processing in the Fuzzy Match tool, we can declare a match threshold and a match weight for that field.
Whether it occurs at the field-level or Fuzzy Match tool-level, the match threshold is a strict cut off value. Matches that generate a match value greater than your match threshold are accepted as matches, those that fall below, are dropped as match candidates.
The match weight of a particular field, allows you to apply a relative importance to the various fields that you are matching upon. If you have 3 fields, where the match weight for each is A=100, B=100, C=75, then you are telling Alteryx that field C should have a lesser contribution to the determination of whether or not a match is made. (As humans, think of how we might select a partner to dance with…)
After passing your data through the Fuzzy Match tool, join back to your original data to compare the matches. You can Sort by match score, to see if you need to adjust your match thresholds higher or lower depending on the presence of false-positive matches, or the absence of false negative matches. As Fuzzy Matching is inherently fuzzy, it is quite common, and in fact necessary to run your module many times with different parameters. As a general rule, the more consideration you place on developing a thorough fuzzy matching module, the greater the value of your output.
Optimizing fuzzy matching processing time:
Because fuzzy matching can require you to run your module many times, it is prudent to prep your data and save it out to a .yxdb file. Saving your data out to .yxdb files, will allow you to use the .yxdb files as an Input to your fuzzy matching module. Alteryx can read a .yxdb file faster than other file types, so this is a great place to start with optimization.
Another step in data preparation, is use the Auto Field tool, which allows Alteryx to select the most appropriate field type and length for every field in your dataset. Depending upon your input data, this can provide shocking improvements in speed.
Assuming you will be doing a merge fuzzy match, your files will require both a record ID field, and a source field; you might as well add them now.
Lastly, there is no point in bringing fields into your Fuzzy Match module that you do not need, use a Select tool to remove them now.
Finally, use your newly optimized files as .yxdb Inputs to your fuzzy matching module. –To summarize, prep data in one module, then Fuzzy Match it in another.
Depending on whether you are DeDuping, Merging, or both, try to minimize your cross checking and extraneous Browse tools into the data when they are no longer necessary. By all means, cross-checking your data is the most important process in fuzzy matching, but for large modules, it is often helpful to remove tools that are no longer necessary.
If you are new to Fuzzy Matching, the most important thing you can do is work through the 2 sample modules included with Alteryx (DeDupeFuzzyMatching, and MergeFuzzyMatch), which can be found at File > Open Sample > Advanced Samples.