I have a set of data with first/last name sets from two different sources. In three consecutive records, John Smith might be matched with Mary Smith, John Smith and Jan Smith. It seems like there should be a way to use some kind of fuzzy logic to score the degree of similarity between the sets of name fields, then use that to select the most likely match.
I've gotten as far as using the soundex() function on each name, and can look at the data and see which records are the most likely matches, but I can't find a function that will give me any sort of indicator of similarity between two fields within the same record. Really, I just need a way to recognize that "N265" resembles "N260" more than it does "K357". I could expand the records vertically and use a standard Fuzzy Match, but I'm looking at doing this on a large dataset, and that might be problematic.
Solved! Go to Solution.
Hey Daniel!
There is a Fuzzy Matching Tool. If you run the tool and compare the data against itself on both top and bottom data streams, it'll come back with match levels and i can say that we've had a great amount of luck from the need that we've had. The tool also allows you to tweak the logic to matching and match confidence levels. This should work much better than a soundex() as well.
Let me know if this helps or PM me if you have more questions!
Jack
hi Daniel,
Not sure if you still need help with this...but thought I'd include a link to this excellent video explaining how to use the Fuzzy-matching tool, from Chris Love
https://www.youtube.com/watch?v=35aNGzJ5kYY
I found this the clearest explanation I've found on how to use the tool, and it's various customisations.