This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
The Fuzzy Match Tool provides some pretty amazing flexibility for string joins with inexact values – usually in the case of names, addresses, phone numbers, or zip codes because many of the pre-configured match styles are designed around the formats of those types of string structures. However, taking advantage of the custom match style and carefully configuring the tool specific to human entered keyword strings in your data can also allow you to use the loose string matching feature of the tool to match those values to cleaner dictionary keyword strings. If done properly, it can help you take otherwise unusable strings and, matching by each individual word, recombine your human entered data to a standardized format that can be used in more advanced analyses:
In the attached v10.6 workflow, Fuzzy Keyword Match.yxmd, we provide an example of the technique, which can be replicated for most strings using the steps below:
Use as inputs your string values and a dictionary lookup table (like this one) of standardized words
Parse your string values into individual words and use the Tile Tool Unique Value Tile Method to keep track of their order in the original string
If necessary, apply special character or digit parsing (we used the RegEx Tool) to clean the word strings more thoroughly but take special care to ensure that the same words get the same record ID that enters the Fuzzy Match Tool to correctly join the strings back together later in the process
Fuzzy Match your word strings to dictionary word strings to find the nearest standardized word match – play with the configuration until this looks right! If you have to loosen the Match Threshold to the point where some words are incorrectly matching, but you still aren’t getting others, try “waterfalling” the matching process with another Fuzzy Match Tool just for the lower threshold words you’re looking to add (Union) to the matched set
Recombine your strings by joining the fuzzy matched dictionary words back to the words they matched with – sort these words by the Tile Sequence Num you obtained earlier in the process and Summarize the strings with a space delimiter to have them back into their intended order
Following these steps should make it possible to make even the most error-filled strings more useable in sentiment or keyword analyses, refine text to be more readable for reporting, and provide you with ample evidence of being a miracle worker. Good luck!