Similar to the Excel Fuzzy Lookup, the Fuzzy Match Tool(see it in action here) makes it easy for a user to perform inexact matches in their data. By specifying similarity thresholds, utilizing varying matching algorithms, and specifying other configuration options, you can customize the tool to best fit your data set. Due to the high degree of customization in the tool, we recommend ramping up to speed with our introductoryand intermediate live training videosif more complex applications of the tool are anticipated. We also have a list of frequently asked questionsand Fuzzy Matching Tips and Tricksthat can supplement your use of the tool as well!
The Fuzzy Match Tool provides some pretty amazing flexibility for string joins with inexact values – usually in the case of names, addresses, phone numbers, or zip codes because many of the pre-configured match styles are designed around the formats of those types of string structures. However, taking advantage of the custom match style and carefully configuring the tool specific to human entered keyword strings in your data can also allow you to use the loose string matching feature of the tool to match those values to cleaner dictionary keyword strings. If done properly, it can help you take otherwise unusable strings and, matching by each individual word, recombine your human entered data to a standardized format that can be used in more advanced analyses:
In the attached v10.6 workflow, Fuzzy Keyword Match.yxmd, we provide an example of the technique, which can be replicated for most strings using the steps below:
- Use as inputs your string values and a dictionary lookup table (like this one) of standardized words
- Parse your string values into individual words and use the Tile Tool Unique Value Tile Method to keep track of their order in the original string
- Use the Unique Tool to determine the unique words in your strings and assign them a record ID using the Record ID Tool
- If necessary, apply special character or digit parsing (we used the RegEx Tool) to clean the word strings more thoroughly but take special care to ensure that the same words get the same record ID that enters the Fuzzy Match Tool to correctly join the strings back together later in the process
- Fuzzy Match your word strings to dictionary word strings to find the nearest standardized word match – play with the configuration until this looks right! If you have to loosen the Match Threshold to the point where some words are incorrectly matching, but you still aren’t getting others, try “waterfalling” the matching process with another Fuzzy Match Tool just for the lower threshold words you’re looking to add (Union) to the matched set
- Recombine your strings by joining the fuzzy matched dictionary words back to the words they matched with – sort these words by the Tile Sequence Num you obtained earlier in the process and Summarize the strings with a space delimiter to have them back into their intended order
Following these steps should make it possible to make even the most error-filled strings more useable in sentiment or keyword analyses, refine text to be more readable for reporting, and provide you with ample evidence of being a miracle worker. Good luck!