Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Using fuzzy match on a list of names that vary significantly

Kearnd967
7 - Meteor

Hi all,

 

I have a data set with a list of names that is manually entered.  Here I could have several differing spellings of the same person.  For example:

 

Colin Hayward
Colin Haywood
Colin Hayword
Colin Heywood

Collin Hayword

 

To me, I can spot that it is the same person, but how can I get Alteryx to do this for me?  The list is too long to hold an Index and the name spelling could change with each report.  I have also tried using first initial and surname, but again, that assumes the first name is spelt correctly.

 

I would like a Fuzzy match logic or equivalent that could get me to a 90% solution with only a bit of manual work left over.  Not sure 100% solution is achievable here.

 

Any ideas would be really appreciated.

 

David.

 

2 REPLIES 2
FinnCharlton
13 - Pulsar

Hi @Kearnd967 , you can try fuzzy matching, in simple situations like your example it can be great:

 

 

image.png

 

However it also has some problems. With dynamic data, it can be hard / impossible to eliminate incorrect matches. Because of this, I'd advise being very careful using this tool in dynamic workflows.

ChrisTX
15 - Aurora

The attached configuration identifies all duplicates in your sample data.

 

Try different options for Match Function, like Jaro Distance and Levenshtein.  And try a different Match Threshold.

 

Like you mentioned, a Fuzzy Match will never be perfect.

 

Chris

Labels