Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Fuzzy Match Tool

Varsha_R
5 - Atom

Hi!

     I would really appreciate some guidance regarding the following scenario

 

I have 2 datasets containing Job Code - Location - Activity Description - Date as 4 columns in 2 different excel sheets (Refer Excel Attached). Using alteryx, I need to compare these 2 datasets using the following criteria as a combination and identify the entries are that satisfy all 4 criteria

 

1. Job Code - Exact Match

2. Date - Exact Match

3. Location - Fuzzy Match (80%)

4. Activity Description - Fuzzy Match (85%)

 

Kindly suggest how the workflow can be framed to handle this requirement?

 

Thanks in advance!

3 REPLIES 3
caltang
17 - Castor
17 - Castor

I don't see the job code at all in your data set, and your SEC sheet's activity description seems to have copies of your Location rather than real activities (unless I am seeing it wrong). 

 

Also, why compare both? Perhaps you can use an LLM to get your desired output as well... fuzzy matching is an art more than a science, and I am not sure what those %s mean.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Pilsner
13 - Pulsar

Hello @Varsha_R 

Like @caltang has pointed out, I cannot see the Job ID in your data. I have tried to takel the scenario with the other three columns.

My initial thoughts where to break this into 3 seperate joins / fuzzy matchs. If we check each criteria individually, then find out which records match on all three sets of criteria, then we can accept it as a valid match overall. These are the steps I followed:

1) Input data and create a Record ID for each input. (To easily differentiate these, I made the record ID start at 1,000,000 for one of the inputs).

 

2) Standardise the columns used in fuzzy matching. For fuzzy matching, the preprocessing steps are just as important as the fuzzy match itself. Here I have simply used a data clense to remove white space and standardise the case. Additionally, I added a Source columns which acts as a lable to distinguish the data sets.

3) Stack the datasets together (as fuzzy match tool only has one input anchor)

 

4) Fuzzy match / join on the required fields, setting the threshold to your desired value:


Fuzzy.png




5) Unique (group by all field) on the fuzzy match and join tool outputs, to get a unique list of the Reocds ID to Record ID mapping.

 

6) Stack the seperate mapping tables together

 

7) Count how many times each match (record ID to record ID) appears. 

 

8) Filter to matches that appear 3 times (i.e matched on all three criteria)

 

9) Use the join tools to join back on record ID to get the original columns in your output.

 

Heres a picture of the full workflow:

Workflow.png

 

Obviously, I havent gone into much detail about how to configure the fuzzy match tool itself, as I believe your quesion mainly related to the process of checking joins/ fuzzy matches against 4 different sets of criteria. If you do want any help on the fuzzy match tool itself then I found the following helpfull:

https://help.alteryx.com/current/en/designer/tools/join/fuzzy-match-tool.html#idp349249

I have attached the workflow below with annotations to try and help explain further. Please do let me know if you have any questions.

Regards - Pilsner

 

WirkKarl
8 - Asteroid

Hi! You can definitely handle this in Alteryx using a mix of exact and fuzzy matching. Start by joining the two datasets on Job Code and Date for the exact match part. Then, use the Fuzzy Match tool separately for Location and Activity Description — just make sure to adjust the thresholds to 80% and 85% respectively. Once you have those results, you can bring everything together to identify the records that meet all four criteria. It might take a bit of trial and error with the fuzzy settings, but Alteryx gives you a lot of control for this kind of comparison. Good luck, and happy building!

Labels
Top Solution Authors