Alteryx Designer Knowledge Base

Definitive answers from Designer experts.
The Expert Exam is now live online! Read about the specifics and what it took to bring it to life in the blog by our very own Elizabeth Bonnell!

Prepare two inputs for Fuzzy Match Merge Mode

Alteryx
Alteryx
Created

How to prepare two inputs for Fuzzy Match Merge Mode

 

Here is a quick reference guide that will help you prepare two different data sources for use with Fuzzy Match Merge Mode. This mode only compares data from different sources, and it is often used to merge new data back with the primary data source. The Fuzzy Match Tool accepts only one input, so you will need to prepare the data first.

 

  1. Join the data together, results that match exactly will go through the J output anchor and do not need fuzzy matching.

 

  1. The remaining data from the L and R output anchors will need record ids for matching. Add record IDs for the first input.

 

  1. Use the Formula Tool to create a new column for each input showing the source. A function with just a text string in quotes will add that text to each record. Using Merge Mode, the Fuzzy Match Tool will only compare records with a different source.

 

  1. Ensure the correct alignment of the columns in the Union Tool by using the Manually Configure Fields option.

 

  1. Sort by the Source column so that all the Input A records with a record ID are placed first in the list.

 

  1. Record IDs for the 2nd input are added after the join using an expression so that the IDs automatically start sequentially after the number of records in the 1st input.

 

  1. In the Fuzzy Matching configuration screen, use the new Record ID and Source ID fields, as well as the fuzzy match fields that were stacked together with the Union Tool.


Here you can see all fields needed for Fuzzy Match Merge Mode Configuration are available.

 

EricWe_3-1576850996259.png

 

The example workflow is attached. Once the Fuzzy Match merge is complete, there are many options for completing the workflow, such as adding a Unique Tool to remove duplicates and joining the matching records back with the original data. Please see the articles and training videos in the Additional Resource section for examples and more information.

 

Additional Resources

 

Tool Mastery Fuzzy Match

Fuzzy Match Tool Alteryx Help Page

Alteryx Academy video training session: Fuzzy Matching for Beginners

Alteryx Academy video training session: Fuzzy Matching Intermediate Users


 

Attachments
Comments
7 - Meteor

Hi,

 

I have couple of questions on this - 

 

1) why did we skip adding record ids for source Input B? We did add them for Input A. Is there any logic behind it?

 

2) And also you mentioned below

 

5. Sort by the Source column so that all the Input A records with a record ID are placed first in the list.

 

But in the workflow you sorted using Name and not Source Column.

 

3) Also I see we have similar record ids for both inputs, how do we use this output file to join with our original input? 

 

For e.g, 9 Tonja Helmuth - Input A , 9 Stefanie Jand - Input B

 

how do we differentiate these?

 

 

 

 

Alteryx
Alteryx

Hi @rick007,

 

Thanks for your comments. 007 is a classic number for James Bond fans :).

 

The record IDs for the second input are added using a conditional expression in the Multi-Row Formula tool so that the ID numbers start sequentially after the largest ID number in the first input. Using this method enables the workflow to dynamically adjust the IDs for the second input if the number of records in the first input changes.

 

Another option is to have a Record ID tool for the second input that starts at a really large number that couldn't overlap with the first input.

 

The Sort tool was set incorrectly as you noticed, my apologies. I updated the article and attached a corrected workflow.