Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Intra-record Fuzzy Matching Fields

daniel_mmi
9 - Comet

I have a set of data with first/last name sets from two different sources. In three consecutive records, John Smith might be matched with Mary Smith, John Smith and Jan Smith. It seems like there should be a way to use some kind of fuzzy logic to score the degree of similarity between the sets of name fields, then use that to select the most likely match.

 

I've gotten as far as using the soundex() function on each name, and can look at the data and see which records are the most likely matches, but I can't find a function that will give me any sort of indicator of similarity between two fields within the same record. Really, I just need a way to recognize that "N265" resembles "N260" more than it does "K357". I could expand the records vertically and use a standard Fuzzy Match, but I'm looking at doing this on a large dataset, and that might be problematic.

2 REPLIES 2
jack_morgan
9 - Comet

Hey Daniel!

There is a Fuzzy Matching Tool. If you run the tool and compare the data against itself on both top and bottom data streams, it'll come back with match levels and i can say that we've had a great amount of luck from the need that we've had. The tool also allows you to tweak the logic to matching and match confidence levels. This should work much better than a soundex() as well.

 

Let me know if this helps or PM me if you have more questions!

 

Jack

simon_runc
7 - Meteor

hi Daniel,

 

Not sure if you still need help with this...but thought I'd include a link to this excellent video explaining how to use the Fuzzy-matching tool, from Chris Love

 

https://www.youtube.com/watch?v=35aNGzJ5kYY

 

I found this the clearest explanation I've found on how to use the tool, and it's various customisations.

an example of using these three Alteryx tools, and others to deduplicate a dataset using fuzzy logic
Labels