General Discussions

Discuss any topics that are not product-specific here.
SOLVED

Name Comparison Help

julianegan
5 - Atom

Hello everyone, how are you doing?

 

I am working on an Alteryx workflow and I have a specific problem. Can anybody help me with the below?

I have to compare 2 columns of names which has different formats, here is an example of our data and the result that we want:

 

Name 1Name 2Validation (True or false)
Smith, JohnSmith, John MTrue
Keynes, PaulKeynes, PaulaFalse
Bates, RolandBates III, Roland True
Morgan, RichardMorgan, Richard CharlesTrue
Robert Jr, MichaelRobert, MichaelTrue
Beckett, J. ScottBeckett, ScottTrue
Pumber, MelissaPumb, MelissaFalse

 

Regards,

Julian.

 

7 REPLIES 7
caltang
17 - Castor
17 - Castor

You need to use the Fuzzy Match tool under the Join tab.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
julianegan
5 - Atom

Hi!

 

I tried to use that tool before but I didn't understand it.

On the one hand, what mode should I use? purge or merge? 

On the other hand, which configurations should I use to get the result that I want?

 

Thanks!

caltang
17 - Castor
17 - Castor

Hi! 

 

Not many people understand it to be honest. I struggle with it too.

 

But to answer your question, you use Purge when you have 1 source file to find all the duplicates across it & to keep one master record’s details, in your case, you have 1 source file. You use merge when you have 2 files, one to compare and one to benchmark with a realiable master dataset.

 

Configuration wise, that is up to you really. Fuzzy match is an art, not a science. So there's no hard rule for it, but you can use Names w Nicknames, and set the threshold accordingly to what you are comfortable with.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
caltang
17 - Castor
17 - Castor

https://www.thedataschool.co.uk/frederik-egervari/fuzzy-matching-in-alteryx/ 

 

@FrederikE I presume you wrote this piece? Perhaps you can take a jab at this problem, I'm stumped on Fuzzy Match myself.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
FrederikE
13 - Pulsar

Hey @julianegan, @caltang

 

Yes, that blog is from me. But I would not choose Fuzzy Matching at all here. Fuzzy Matching works well with word additions or if they sound differently.

It won't do very well with Paul vs. Paula. We only know this is wrong because they are distinct names (and both are common), but semantically they are very similar so Fuzzy M won't be able to pick such things up.

 

I would recommend a more usuall Alteryx approach. My go to would be a two-step solution (still won't be 100% accurate):

1. A lookup table that is used to eliminate common titles, etc. (Jr., Ms., etc.)

2. A Regex Logic to clean the strings (e.g. remove single digit words ("M" in "Smith, John M")

 

see the attached workflow. 

 

Unbenannt.png

 

There won't be a solution that covers all error possibilities. Hope I could help with this. 

caltang
17 - Castor
17 - Castor

Thanks @FrederikE ! Agreed, after re examining the question I think you hit the nail right on its head. 

 

@julianegan This process is an art more of a science, so you may not hit perfection with your entire dataset. Maybe those that get filtered out, you may have to visually compare them for now. 

Danke schön @FrederikE !

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
julianegan
5 - Atom

Hello!

 

Thank you for your time, willingness to teach and the knowledge @FrederikE y @caltang !

 

Greetings,

Julian!

Labels
Top Solution Authors