Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Matching lists with different spellings

Grace_Mitchell
5 - Atom

Hello, I am new to Alteryx and have the following issue.

 

I am trying to reward participation for members who have attended an event. To give an organization a participation point, I must upload a csv file in our backend with the exact spelling of the member organization. I have the master list of member organizations but the participation for the event is based on user submitted data. Consequently,  the organizations are spelled differently in the users submitted data (e.g. Hp vs HP Inc. vs HP, Inc. vs HP, etc) than the master list. 

 

Each organization will only get one point even if multiple people from the same organization attend. How can I use Alteryx to compare the two lists and create an output of the unique organizations with the correct spelling?

 

Currently, I removed the duplicates in the user submitted data but unsure how to move forward.

 

Thank you!

 

4 REPLIES 4
MattBSlalom
11 - Bolide

This is a perfect opportunity to try out the Fuzzy Match tool!

 

Load your source datasets with an attribute to tell them apart, then Union them together and assign a Record ID for use in the Fuzzy Match tool.  There are a lot of configurations that can be done within the Fuzzy Match tool, but we can keep it pretty simple for this case, since we're only matching on the single field of the Organization's name.  Next get a Unique set of the match results then Join back to the original lists to get the names back instead of just the IDs from the Fuzzy Match tool.

 

MattBSlalom_0-1618265293731.png

 

Grace_Mitchell
5 - Atom

Thanks so much! I used a much larger data set and had the following error.

It began matching companies that shared one common word as a match. (eg. Lincoln Financial was matched with Synchrony Financial, Holland and Knight LLP was matched with Thompson and Knight LLP). Is there a way to prevent this?

MattBSlalom
11 - Bolide

This is where we start getting into all the configuration options in the Fuzzy Match tool that I tried to gloss over in my first reply 😉

 

The simple answer is to increase the Match Threshold (the "confidence" of an accurate match) to a higher percentage than the 70% I used.  There's a Threshold for the individual criteria (Organization name) and another for the overall match across all criteria.  In this case, they are the same since you have only the 1 field, but they are set separately.

 

MattBSlalom_0-1618272849533.png

 

 

The tradeoff here is that if you increase this value, you'll likely have scenarios on the other side of the spectrum; cases that currently find a match that will no longer meet the increased Match Threshold.  There are a lot of potential gymnastics and trial & error to get Fuzzy Matching to the "perfect" level for a given use case.  So, to get it "good enough" I'd suggest trying a few values in the Match Threshold and finding the least impactful set of "errors".

 

Then potentially just add an earlier step to replace known variations explicitly that are in the fallout set such that they are matched correctly given the replaced value.  Another option is to have a series of Fuzzy Match processes with varying Thresholds & potentially varying "master" data to try to get additional matches after an initial "high" Threshold match step.  As you can see in the config screenshot above, there are a LOT of other options that can be tweaked within a given match field, so yet another option would be to add a second Organization name match field in the tool and have it match with alternative settings such that the outer Match Threshold is now taking effect on the combination of multiple Organization name match attempts.

 

Sorry I don't have an easy button fix for you at this point.  Hopefully the above suggestions help get you to an acceptable solution.

 

Grace_Mitchell
5 - Atom

This was extremely helpful! Thank you for taking the time to explain it!

Labels
Top Solution Authors