I am working on a workflow where I need to join multiple data sources to get the required information. The problem is these data sources doesnt have a common field to join on. My safe bet is to join on names and get the email id that can be used to join multiple data sources. But the problem is that the names in different tables are not in the same format to join.
Table 1:
Mickey Mouse
Duck Donald
Duck Donald
Alvin Li
Mickey Mouse
Table 2:
Mickey x Mouse
Donald Duck
Alvin C Li
Is there a way to join these two sources and get the required information?
Solved! Go to Solution.
Hi @msve!
This sounds like an exact description of Fuzzy Matching. Luckily Alteryx has a tool for this built in, however it is a little complex at first and I rarely see it used to its full potential.
If you'd like to read more about the tool, you can here!
However, I think i have made a workflow that should do this for you.
The method to do so, is to union the data so we have a long list of all the data. Then assign a record ID (as is required in the fuzzy match tool), and run it through fuzzy matching. The main configuration here is:
The threshold (how close they have to be, to be classed as a 'match'), the field name (in this case, we want to compare 'name'), and the Match Style (there's multiple options for this, but we are looking at names with potential nicknames, so that's the option i have chosen).
The output from this, is what records match:
Now as you can see, there are two for each. This is because it will say that 3 is a match of 6, but also 6 is a match of 3. In order to remove these, i simply applied a sample tool, to take the first of each record, and then join them back based on the record ID.
I have attached the workflow, please let me know if this is suitable, or if you have any questions!
Thanks,
TheOC
@TheOC Thank you so much for the detailed explanation. I was about to recreate my workflow 🙂