Hello fellow Alteryx fans!
I'm teaching myself how to use the Fuzzy Match tool, but I've hit a bit of a wrinkle. I was wondering if any of you bright sparks might be able to help, please?
I attach my dummy workflow where you will see that I have a list of company names. I want to match these names to eliminate duplicates. Some of the names are followed by the company form, for example Inc or LLC. No great problem here.
However, some of the company names contained typos, for example missing the final character.
I have generated my fuzzy match, and I'm very surprised that the tool is treating names like "Amphiy" and "Amphiyo" as distinct.
Now this is a completely forced example, so let's not just assume that it's always simply the final character that is missing…
Is there a way that I could improve the matching logic so that fields like "Amphiy" and "Amphiyo" are brought together?
Very interested to learn more about this tool, and looking forward to hearing your good advice!
Many thanks,
Jonathan
Solucionado! Ir para Solução.
Hey Jonathan ( @jonathanogrady )
I must say - my learning method is similar to yours - take one tool at a time and work with it until I master it, so I have a lot of sympathy for your process.
Unfortunately your sample data did not come through, but I've mocked some up in a text input including the specific case you mentioned (Amphiy) and I'm getting a solid match.
The key is that sometimes you need to do multiple passes of Fuzzy Matching in order to get to a final set - in this case, I did the first pass using the Company Name type matching logic; and the second pass using Soundex (sounds like) with no punctuation.
This stripped the data set down very nicely to exactly what we'd expect.
Hopefully this gets you to the solution you need - if so, would you mind marking as solved? If not, feel free to reply with a revised workflow with additional data built in.
Cheers Jonathan
Sean
Sean,
Thank you so much for taking the time to look into my question.
I'm sorry my original dataset didn't come through. I have looked at your solution but I'm getting a little confused with the different naming conventions.
I have attached my original dataset of mocked up data. It would be good to see your solution with the original data if that's at all possible?
Very close to a solution, thank goodness!
Best wishes,
Jonathan
Sean,
This is fantastic! It has greatly aided my learning, and will hopefully help others too.
Thanks a million for following up.
Best wishes,
Jonathan
Sean,
This is fantastic! It has greatly aided my learning, and will hopefully help others too.
Thanks a million for following up.
Best wishes,
Jonathan
Sean,
I've been thinking further about this fuzzy match example, and I had some feedback for you if you have time?
You kindly proposed a solution to my "find similar companies" question.
However, if I understand your solution correctly, we do not achieve a high level of integration. Take your example above. Record 1 matches with record 2 and record 3. But record 4 is unique as is record 5, however these are all the same company… At least, I would like to get to a solution where the fuzzy match at least suggests they are.
I have worked on this further myself, and attach a proposed workaround. I would be interested to know what you think/how it can be improved (it's not perfect by any stretch!)
I have taken the proposed match keys, and then I have ran a fuzzy match on those keys to group together similar. I then use those similar keys to group the original data.
As this is a forced example, I know that of the 480 original company names, there are 80 genuine companies, with various legal extensions such as GmbH, SA, Ltd et cetera.
My solution gets me down to 95 suggested distinct companies.
I wonder if this is an appropriate use of the fuzzy match tool? Would love to know your thoughts!
Best wishes,
Jonathan
PS: raw data was attached previously
Thank you Jonathan - very interested to see your updates - will try to take a look tomorrow morning or on the weekend
Cheers!
Sean