on 09-09-2020 12:52 PM - edited on 07-21-2021 10:37 AM by kcsrestapiuser
How to prepare two inputs for Fuzzy Match Merge Mode
Here is a quick reference guide that will help you prepare two different data sources for use with Fuzzy Match Merge Mode. This mode only compares data from different sources, and it is often used to merge new data back with the primary data source. The Fuzzy Match Tool accepts only one input, so you will need to prepare the data first.
Here you can see all fields needed for Fuzzy Match Merge Mode Configuration are available.
The example workflow is attached. Once the Fuzzy Match merge is complete, there are many options for completing the workflow, such as adding a Unique Tool to remove duplicates and joining the matching records back with the original data. Please see the articles and training videos in the Additional Resource section for examples and more information.
Additional Resources
Fuzzy Match Tool Alteryx Help Page
Alteryx Academy video training session: Fuzzy Matching for Beginners
Alteryx Academy video training session: Fuzzy Matching Intermediate Users
Hi,
I have couple of questions on this -
1) why did we skip adding record ids for source Input B? We did add them for Input A. Is there any logic behind it?
2) And also you mentioned below
5. Sort by the Source column so that all the Input A records with a record ID are placed first in the list.
But in the workflow you sorted using Name and not Source Column.
3) Also I see we have similar record ids for both inputs, how do we use this output file to join with our original input?
For e.g, 9 Tonja Helmuth - Input A , 9 Stefanie Jand - Input B
how do we differentiate these?
Hi @rick007,
Thanks for your comments. 007 is a classic number for James Bond fans :).
The record IDs for the second input are added using a conditional expression in the Multi-Row Formula tool so that the ID numbers start sequentially after the largest ID number in the first input. Using this method enables the workflow to dynamically adjust the IDs for the second input if the number of records in the first input changes.
Another option is to have a Record ID tool for the second input that starts at a really large number that couldn't overlap with the first input.
The Sort tool was set incorrectly as you noticed, my apologies. I updated the article and attached a corrected workflow.
Hi @EricWe, thanks for the guide!
I also have a question around the creation of recordIDs. Could you just put the 'recordID' tool after the sort instead of putting in after the left join? Wouldn't this have resulted in the same output? i.e. you union everything first, then sort then add recordID.
Or maybe I'm missing something here?
Hi @akasubi
There are multiple ways to add the record ids. We just want two separate sets of ids that do not overlap. The record ids for the left Join output are for Input A (see the source column added in step 3). After the Union tool, records from the Right Join that are marked as Input B have a null value for the record id.
The Sort tool groups all the Input A records with record ids together, Then, when the Multi-Row Formula in step 6 finds the first null value for a record id in the Input B group, it adds 1 to the previous row from Input A (row-1 in the formula). Afterward, it keeps adding 1 to the value in the next row as it goes through the Input B records to replace the null values. It is a way to ensure the record ids never overlap regardless of changes in the input sizes.
Instead of using the Multi-Row Formula tool, you could add another Record Id tool right after the right Join output and start the numbers higher than the maximum amount of records that would occur in Input A.
You don't understand how long I've been waiting for someone to explain Fuzzy Match to me in this concise way. Almost two years of using Alteryx and Fuzzy Match continued to elude me for so long. THANK YOU!!!
I'd like to echo YomaraA, this is an excellent walk through of Fuzzy Match Merge preparation!
I've taken the liberty of adding a couple of steps to the process, joining the data together to get one joined output (i.e. using the Fuzzy Match result!)