Dear All,
I'm in a situation where I need to compare two columns that contain the names of companies and determine how many match and how many do not match with the 'Matching Score'
Example:
Name | Name2 |
ABC LIMITED | ABC LTD |
XYZ PRIVATE LTD | XYZ PVT LTD |
123 PUBLIC LTD | 123 PUBLIC LTD |
And needed a output as below:
Name | Name2 | Match Score |
ABC LIMITED | ABC LTD | 90 |
XYZ PRIVATE LTD | XYZ PVT LTD | 90 |
123 PUBLIC LTD | 123 PUBLIC LTD | 100 |
What tool will generate this output, and how will it be configured? Kindly assist. I tried 'Fuzzy Match Tool' - but no luck.
Hey @ravikumar060987,
Here is one way to do this:
I check the example workflow here:
In there example they put everything on one column to match on companies.
Any questions or issues please ask :)
HTH!
Ira
@IraWatt - Thanks for the quick update.
Quick clarification: Why is there a duplicate value? for the second and third rows, but not the first?
Hello @ravikumar060987
I did it using Fuzzy Match. See below
This video also explains the process: https://community.alteryx.com/t5/Archived-Training/Fuzzy-Matching-Intermediate-Users/m-p/43852
Cheers!
Hi @ravikumar060987 ,
The Alteryx Academy is a great place to look for content on how to use some of the more advanced tools, like the Fuzzy Match , which will indeed give you what you want, but it is a difficult tool to master and will take some effort to learn. Whenever I use it, I have to refresh myself on it using some of the great free resources Alteryx provides.
Here's just a few:
Hopefully this gets you on your way; cheers!
@ravikumar060987 I think its because they have different match keys (I'm not a huge expert on matching):
However a simple summarize can fix it:
Fuzzy Match will create many rows based on the Match score to other names. That's why it's a best practice to sort "Match Score" in a Desc order, then add a Unique tool to just keep the ID/name with the highest score.
@christine_assaad @ thank you for the quick update
However, I have the source date in a tool. And because the records are so large, switching to another input file is not an option.
Is there another way to get this done quickly?
@IraWatt - That's the good piece of information.
In this case you can use Fuzzy Match in Purge mode. Purge is used for deduping when all records are coming from the same source.
The process will look similar to what @IraWatt sent. It's attached as well.