Alteryx Designer Desktop Discussions

jonathanogrady · ‎05-07-2017

Hello fellow Alteryx fans!

I'm teaching myself how to use the Fuzzy Match tool, but I've hit a bit of a wrinkle. I was wondering if any of you bright sparks might be able to help, please?

I attach my dummy workflow where you will see that I have a list of company names. I want to match these names to eliminate duplicates. Some of the names are followed by the company form, for example Inc or LLC. No great problem here.

However, some of the company names contained typos, for example missing the final character.

I have generated my fuzzy match, and I'm very surprised that the tool is treating names like "Amphiy" and "Amphiyo" as distinct.

Now this is a completely forced example, so let's not just assume that it's always simply the final character that is missing…

Is there a way that I could improve the matching logic so that fields like "Amphiy" and "Amphiyo" are brought together?

Very interested to learn more about this tool, and looking forward to hearing your good advice!

Many thanks,

Jonathan

ThizViz · ‎05-07-2017

You'll want to make sure that your match key is one of the output fields and take a look at it. Often times the match key does not include all the vowels, so it may be reducing your company names to three consonants or something to that effect.

At that point you can go back and adjust your match settings so that the key has to be more specific or try a different kind of match. Also one of the tricks of the trade is to do sequential matches starting with more loosely defined criteria, then taking the results that are left in subjecting them to more rigorous criteria. It's basically a cascading flow of fuzzy matches.

@thizviz aka cbridges, Bolide
http://community.alteryx.com/t5/user/viewprofilepage/user-id/2328

SeanAdams · ‎05-07-2017

Hey Jonathan ( @jonathanogrady )

I must say - my learning method is similar to yours - take one tool at a time and work with it until I master it, so I have a lot of sympathy for your process.

Unfortunately your sample data did not come through, but I've mocked some up in a text input including the specific case you mentioned (Amphiy) and I'm getting a solid match.

The key is that sometimes you need to do multiple passes of Fuzzy Matching in order to get to a final set - in this case, I did the first pass using the Company Name type matching logic; and the second pass using Soundex (sounds like) with no punctuation.

This stripped the data set down very nicely to exactly what we'd expect.

Hopefully this gets you to the solution you need - if so, would you mind marking as solved? If not, feel free to reply with a revised workflow with additional data built in.

Cheers Jonathan

Sean

jonathanogrady · ‎05-08-2017

Sean,

Thank you so much for taking the time to look into my question.

I'm sorry my original dataset didn't come through. I have looked at your solution but I'm getting a little confused with the different naming conventions.

I have attached my original dataset of mocked up data. It would be good to see your solution with the original data if that's at all possible?

Very close to a solution, thank goodness!

Best wishes,

Jonathan

SeanAdams · ‎05-08-2017

Here you go Jonathan,

I've pulled your data into the text input so it's all baked into the workflow, and taken a screenshot with the data so that you can see that the company you were looking at is now matching.

jonathanogrady · ‎05-08-2017

Sean,

This is fantastic! It has greatly aided my learning, and will hopefully help others too.

Thanks a million for following up.

Best wishes,

Jonathan

jonathanogrady · ‎05-08-2017

Sean,

This is fantastic! It has greatly aided my learning, and will hopefully help others too.

Thanks a million for following up.

Best wishes,

Jonathan

jonathanogrady · ‎05-09-2017

Sean,

I've been thinking further about this fuzzy match example, and I had some feedback for you if you have time?

You kindly proposed a solution to my "find similar companies" question.

However, if I understand your solution correctly, we do not achieve a high level of integration. Take your example above. Record 1 matches with record 2 and record 3. But record 4 is unique as is record 5, however these are all the same company… At least, I would like to get to a solution where the fuzzy match at least suggests they are.

I have worked on this further myself, and attach a proposed workaround. I would be interested to know what you think/how it can be improved (it's not perfect by any stretch!)

I have taken the proposed match keys, and then I have ran a fuzzy match on those keys to group together similar. I then use those similar keys to group the original data.

As this is a forced example, I know that of the 480 original company names, there are 80 genuine companies, with various legal extensions such as GmbH, SA, Ltd et cetera.

My solution gets me down to 95 suggested distinct companies.

I wonder if this is an appropriate use of the fuzzy match tool? Would love to know your thoughts!

Best wishes,

Jonathan

PS: raw data was attached previously

SeanAdams · ‎05-10-2017

Thank you Jonathan - very interested to see your updates - will try to take a look tomorrow morning or on the weekend

Cheers!

Sean

Alteryx Designer Desktop Discussions

Yet another Fuzzy Match question…

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Re: Chaining Apps

Re: Unable to read in all raw xml from an excel fi...

Unable to read in all raw xml from an excel file