I am in need of solving the below given requirement.
Requirement:
I have two datasets which has only one column called Name. That column contains a list of user names in both the datasets so from this dataset the requirement is when a user inputs a name from data 1 similar names from data 2 needs to be shown with their similarity score (Name matching score). I have tried Fuzzy Wuzzy Package but it takes more time for finding results provided that if the dataset limit is huge.So it will be of great help if some one can guide me or provide me a solution to this requirement.
I have also gone through some algorithms like soundex ,cosine similarity,bk tree,levenstein distance but it does not solve the requirement.
Thanks, Karthik
It sounds like you've all ready exhausted all avenues. One step to include in any algorithm you try is to first do a straight join on the two data sets. A join is quick for finding exact matches and if you remove them first, you'll have fewer to get fuzzy wuzzy with
Another point to consider is that since you already have the two lists you can pre-calculate the scores and store them keyed on name. When a user enters a name from one list, use the db to find the pre-crunched distances and search for these distances in the second list. Even if the pre-calculation step takes days to complete, you'll only have to do it once.
Dan