Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Using fuzzy match on a list of names that vary significantly

Kearnd967
メテオール

Hi all,

 

I have a data set with a list of names that is manually entered.  Here I could have several differing spellings of the same person.  For example:

 

Colin Hayward
Colin Haywood
Colin Hayword
Colin Heywood

Collin Hayword

 

To me, I can spot that it is the same person, but how can I get Alteryx to do this for me?  The list is too long to hold an Index and the name spelling could change with each report.  I have also tried using first initial and surname, but again, that assumes the first name is spelt correctly.

 

I would like a Fuzzy match logic or equivalent that could get me to a 90% solution with only a bit of manual work left over.  Not sure 100% solution is achievable here.

 

Any ideas would be really appreciated.

 

David.

 

2件の返信2
FinnCharlton
パルサー

Hi @Kearnd967 , you can try fuzzy matching, in simple situations like your example it can be great:

 

 

image.png

 

However it also has some problems. With dynamic data, it can be hard / impossible to eliminate incorrect matches. Because of this, I'd advise being very careful using this tool in dynamic workflows.

ChrisTX
オーロラ

The attached configuration identifies all duplicates in your sample data.

 

Try different options for Match Function, like Jaro Distance and Levenshtein.  And try a different Match Threshold.

 

Like you mentioned, a Fuzzy Match will never be perfect.

 

Chris

ラベル