Name Comparison Help
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello everyone, how are you doing?
I am working on an Alteryx workflow and I have a specific problem. Can anybody help me with the below?
I have to compare 2 columns of names which has different formats, here is an example of our data and the result that we want:
Name 1 | Name 2 | Validation (True or false) |
Smith, John | Smith, John M | True |
Keynes, Paul | Keynes, Paula | False |
Bates, Roland | Bates III, Roland | True |
Morgan, Richard | Morgan, Richard Charles | True |
Robert Jr, Michael | Robert, Michael | True |
Beckett, J. Scott | Beckett, Scott | True |
Pumber, Melissa | Pumb, Melissa | False |
Regards,
Julian.
Solved! Go to Solution.
- Labels:
- Community
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You need to use the Fuzzy Match tool under the Join tab.
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi!
I tried to use that tool before but I didn't understand it.
On the one hand, what mode should I use? purge or merge?
On the other hand, which configurations should I use to get the result that I want?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi!
Not many people understand it to be honest. I struggle with it too.
But to answer your question, you use Purge when you have 1 source file to find all the duplicates across it & to keep one master record’s details, in your case, you have 1 source file. You use merge when you have 2 files, one to compare and one to benchmark with a realiable master dataset.
Configuration wise, that is up to you really. Fuzzy match is an art, not a science. So there's no hard rule for it, but you can use Names w Nicknames, and set the threshold accordingly to what you are comfortable with.
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
https://www.thedataschool.co.uk/frederik-egervari/fuzzy-matching-in-alteryx/
@FrederikE I presume you wrote this piece? Perhaps you can take a jab at this problem, I'm stumped on Fuzzy Match myself.
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @julianegan, @caltang
Yes, that blog is from me. But I would not choose Fuzzy Matching at all here. Fuzzy Matching works well with word additions or if they sound differently.
It won't do very well with Paul vs. Paula. We only know this is wrong because they are distinct names (and both are common), but semantically they are very similar so Fuzzy M won't be able to pick such things up.
I would recommend a more usuall Alteryx approach. My go to would be a two-step solution (still won't be 100% accurate):
1. A lookup table that is used to eliminate common titles, etc. (Jr., Ms., etc.)
2. A Regex Logic to clean the strings (e.g. remove single digit words ("M" in "Smith, John M")
see the attached workflow.
 
There won't be a solution that covers all error possibilities. Hope I could help with this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @FrederikE ! Agreed, after re examining the question I think you hit the nail right on its head.
@julianegan This process is an art more of a science, so you may not hit perfection with your entire dataset. Maybe those that get filtered out, you may have to visually compare them for now.
Danke schön @FrederikE !
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello!
Thank you for your time, willingness to teach and the knowledge @FrederikE y @caltang !
Greetings,
Julian!