community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

Keyword Fuzzy Matching Strings: Clean Human Entered Data

Community Data Engineer
Community Data Engineer
Created on

Similar to the Excel Fuzzy Lookup, the Fuzzy Match Tool (see it in action here) makes it easy for a user to perform inexact matches in their data. By specifying similarity thresholds, utilizing varying matching algorithms, and specifying other configuration options, you can customize the tool to best fit your data set. Due to the high degree of customization in the tool, we recommend ramping up to speed with our introductory and intermediate live training videos if more complex applications of the tool are anticipated. We also have a list of frequently asked questions and Fuzzy Matching Tips and Tricks that can supplement your use of the tool as well!

 

The Fuzzy Match Tool provides some pretty amazing flexibility for string joins with inexact values – usually in the case of names, addresses, phone numbers, or zip codes because many of the pre-configured match styles are designed around the formats of those types of string structures. However, taking advantage of the custom match style and carefully configuring the tool specific to human entered keyword strings in your data can also allow you to use the loose string matching feature of the tool to match those values to cleaner dictionary keyword strings. If done properly, it can help you take otherwise unusable strings and, matching by each individual word, recombine your human entered data to a standardized format that can be used in more advanced analyses:

 

result.jpg

 

In the attached v10.6 workflow, Fuzzy Keyword Match.yxmd, we provide an example of the technique, which can be replicated for most strings using the steps below:

 

  • Use as inputs your string values and a dictionary lookup table (like this one) of standardized words
  • Parse your string values into individual words and use the Tile Tool Unique Value Tile Method to keep track of their order in the original string
  • Use the Unique Tool to determine the unique words in your strings and assign them a record ID using the Record ID Tool
    • If necessary, apply special character or digit parsing (we used the RegEx Tool) to clean the word strings more thoroughly but take special care to ensure that the same words get the same record ID that enters the Fuzzy Match Tool to correctly join the strings back together later in the process
  • Fuzzy Match your word strings to dictionary word strings to find the nearest standardized word match – play with the configuration until this looks right! If you have to loosen the Match Threshold to the point where some words are incorrectly matching, but you still aren’t getting others, try “waterfalling” the matching process with another Fuzzy Match Tool just for the lower threshold words you’re looking to add (Union) to the matched set
  • Recombine your strings by joining the fuzzy matched dictionary words back to the words they matched with – sort these words by the Tile Sequence Num you obtained earlier in the process and Summarize the strings with a space delimiter to have them back into their intended order

 

workflow.jpg

 

Following these steps should make it possible to make even the most error-filled strings more useable in sentiment or keyword analyses, refine text to be more readable for reporting, and provide you with ample evidence of being a miracle worker. Good luck!

 

win.gif

Attachments
Comments
Quasar
Quasar

Matt,

 

Just to let you know I love two things.

 

1) Your explanation of fuzzy matching practices.

 

2) Your use of  Kanye West.

 

Treyson

Asteroid

I like your pic man!

Meteoroid

HI Matt

I love this:-)

//anitta

Quasar

Excellent!

Good