This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a set of data with first/last name sets from two different sources. In three consecutive records, John Smith might be matched with Mary Smith, John Smith and Jan Smith. It seems like there should be a way to use some kind of fuzzy logic to score the degree of similarity between the sets of name fields, then use that to select the most likely match.
I've gotten as far as using the soundex() function on each name, and can look at the data and see which records are the most likely matches, but I can't find a function that will give me any sort of indicator of similarity between two fields within the same record. Really, I just need a way to recognize that "N265" resembles "N260" more than it does "K357". I could expand the records vertically and use a standard Fuzzy Match, but I'm looking at doing this on a large dataset, and that might be problematic.
There is a Fuzzy Matching Tool. If you run the tool and compare the data against itself on both top and bottom data streams, it'll come back with match levels and i can say that we've had a great amount of luck from the need that we've had. The tool also allows you to tweak the logic to matching and match confidence levels. This should work much better than a soundex() as well.
Let me know if this helps or PM me if you have more questions!