We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

name similarity matching

Karthik_7694
8 - Asteroid

I am in need of solving the below given requirement.

 

Requirement:

I have two datasets which has only one column called Name. That column contains a list of user names in both the datasets so from this dataset the requirement is when a user inputs a name from data 1 similar names from data 2 needs to be shown with their similarity score (Name matching score). I have tried Fuzzy Wuzzy Package but it takes more time for finding results provided that if the dataset limit is huge.So it will be of great help if some one can guide me or provide me a solution to this requirement.

I have also gone through some algorithms like soundex ,cosine similarity,bk tree,levenstein distance but it does not solve the requirement.

 

Thanks, Karthik

 
1 REPLY 1
danilang
19 - Altair
19 - Altair

Hi @Karthik_7694 

 

It sounds like you've all ready exhausted all avenues.  One step to include in any algorithm you try is to first do a straight join on the two data sets.  A join is quick for finding exact matches and if you remove them first, you'll have fewer to get fuzzy wuzzy with

 

Another point to consider is that since you already have the two lists you can pre-calculate the scores and store them keyed on name.  When a user enters a name from one list, use the db to find the pre-crunched distances and search for these distances in the second list.  Even if the pre-calculation step takes days to complete, you'll only have to do it once.

 

Dan 

Labels
Top Solution Authors