I have a large dataset of names in one field that can either be a first and last name, or a company name, or a combination. What I need to do is find "families" and create a group name for that family. I'm using fuzzy match to do it and it's working to a certain extent, but one case in particular (the Pump brothers) is not getting picked up. I am naming the group name the first instance of that group. I'm matching on Name (at a very low threshold) and an exact match on Zip. As you can see below, all the Reickers are correctly grouped, as is Pumption. But the Pump brothers Derek, Dustin and Alan should be in the same family. Is there some other way I should be doing this?
Thanks for any suggestions. I'm happy to provide more info if needed.
Solved! Go to Solution.
That is the unfortunate thing with fuzzy match it is never perfect. Another idea you can try is if you have the grouping name; you RegEx and parse just the last name from the grouping name. Then use a formula tool with contains the last night from the grouping name into the name field; and then output the group value if you have a match.
As a follow up, I used a waterfall approach to the problem and captured most of the matches. In the pass I did a fuzzy match on name at 35% threshold and zip at an exact match and created a group name. Then I did a second fuzzy match on group name again at 35% and another supporting field at 100% and updated the group name. The found almost all the matches. The 2 supporting fields minimized the false positives.
Hi Debbie!
This is a great workflow! Would you mind sharing the .yxmd file here as reference! I am facing a similar and equally challenging fuzzy matching issue as well!
Thank you :)