Need Help Building a Fuzzy Match Workflow to Match Vendor Names Across Two Inputs

PassION_es

Hi Alteryx Community,

I hope you can help me with a workflow I’m trying to build in Alteryx Designer (version 2024.2).

I have two input datasets:

Input 1

Contains a free‑text field (DESC1 OR DESC2) with embedded vendor names. Example (see attached dummy file - input 1 tab)

Input 2

Contains clean vendor names (see Input 2 sheet)

Goal

I need to match the vendor name embedded inside Input 1 with the clean vendor name in Input 2 using Fuzzy Match, then produce an output like in the Output tab.

Vendor_Find_name_dummy.xlsx

Fuzzy Match

Accepted answers

binu_acs

Can you provide a list of those names with “Corp.”, “Inc.”, “LTD”, “Limited”, etc.

binu_acs

@PassION_es Try now

Fuzzy_match_20260316_Sequence_Match.yxmd

All comments

alexnajm

I would look at this article - it has a workflow to help you get started! https://community.alteryx.com/t5/Tool-Mastery/Tool-Mastery-Fuzzy-Match/ta-p/45485

binu_acs

@PassION_es Attaching both python and Fuzzy match version, python is more accurate compared to fuzzy match tool

Fuzzy_match_20260316.yxmd

flying008

Hi, @PassION_es

Another way for you: (don't need python or fuzzy match tool.)

录制_2026_03_17_09_28_04_12.gif

PassION_es

Hi @flying008 , thanks for this. However, it did not match the correct output. we have a lot of vendors with some of these have almost similar names such as "United" (United Emirates, United Lab) and such as a vendor name with "Corporation/Corp" on it. It did not match the correct vendor if you will only get one single word from the description or vendor masterlist.

flying008

Hi, @PassION_es

Unless you can provide more sample data or precise matching thresholds, we cannot offer further help.

PassION_es

Hi @flying008

I have added 4 vendor names having the same single word at the end (but some similar names are not always located at the end).

Vendor_Find_name_dummy - Copy.xlsx

PassION_es

Hi @binu_acs ,

The output produced should be the same in the dummy file:

dummy_output_screenshot.png

binu_acs

@PassION_es The python script works for me

Fuzzy_match_20260316.yxmd

flying008

Hi, @PassION_es

You don't even need to modify any part of above workflow, and the output will still be exactly what you want.

Input01		Input02
DESC 1	DESC 2	Vendor Name
IN025-00050023-UNITED IND		DUMMY BANK PHL
IN0b0-00011501-DUMMYBAN		UNITED INDONESIA INSURANCE CORP
ACCRUAL	YOLANDAURSU -001 -PROF FEES FEB-2026	ANGELO JAYVEN
	AMOT NIMO - PEUT SYSTEMS LIMITED	URSUA, YOLANDA
PH001235-ATU Group, Inc.		JL MOLINA
		URLIFE LIMITED
		PEUT SYSTEMS LIMITED
		ATU Group, Inc.
		LAMI-AN INC


Output
DESC 1	DESC 2	Vendor Name
IN025-00050023-UNITED IND		UNITED INDONESIA INSURANCE CORP
IN0b0-00011501-DUMMYBAN		DUMMY BANK PHL
ACCRUAL	YOLANDAURSU -001 -PROF FEES FEB-2026	URSUA, YOLANDA
	AMOT NIMO - PEUT SYSTEMS LIMITED	PEUT SYSTEMS LIMITED
PH001235-ATU Group, Inc.		ATU Group, Inc.

PassION_es

@flying008 can you share your yxmd or yxzp file? Because the workflow i created based on your video resulted to incorrect outputs. Thanks.

PassION_es

Hi binu_acs,

I have tried the recent workflow you posted and there was an error I encountered during running. It came from the Python tool. Please see attached notepad for the error posted from the results window of alteryx.

Do I need to install a python environment and others in my local drive before running the python? Thanks.

python_error.txt

flying008

Hi, @PassION_es

Due to security policy restrictions, I am unable to upload any workflow files. Please upload the files as you created based on the animation, and I will review the options to confirm the differences later(assuming you have disabled AMP mode).

addressbox

Great discussion on building a fuzzy match workflow for vendor names 👍 Handling variations like abbreviations, spelling differences, and extra words is always tricky. Pre-cleaning data (removing terms like “Ltd”, “Co”, etc.) before applying the Fuzzy Match tool really improves accuracy.
Also, adjusting match thresholds and using grouping helps refine results further.
For exploring more data and real estate insights, Addressbox is also a useful platform.

PassION_es

Hello @flying008 ,

Here's the dummy yxmd file. Please note that this workflow now works on my dummy file and it produced the desired results. However, upon using this in an actual inputs with larger datasets, it did not produce the correct vendor name. Could you pls check my workflow? Thanks

Fuzzy_fly.yxmd

screenshots_dtaset_fuzzy.png

flying008

Hi, @PassION_es

1. Please upload the template data for your error. Your workflow file itself depends on the imported xlsx file, but you did not upload it, so I cannot reproduce the scene of the error. (Your workflow tool settings are fine.)
2- Please disable AMP mode.

PassION_es

Hi @flying008 ,

See attched dummy file and workflow. I disabled also the AMP.

Vendor_Find_name_dummy - Copy.xlsx

Fuzzy_fly.yxmd

binu_acs

@PassION_es rapidfuzz is the best option, i updated the code to install this module, can you try now

Copy of Fuzzy_match_20260316.yxmd

PassION_es

Hi Binu_acs,

I think I am having issues with the version for embedding python. I am currently using alteryx 2024.2 version.

Please see attached screenshots:

SC_Configuration.png

SC_ErrorPython.png

SC_Configuration_end.png

PassION_es

@binu_acsThis is what I got from Co-pilot regarding the issue of rapidfuzz

co-pilot_answer_rapidfuzz.docx

binu_acs

@PassION_es try the attached workflow, i m using different module here

Fuzzy_match_20260316_Sequence_Match.yxmd

binu_acs

@PassION_es the module rapidfuzz is the best option for your use-case, !pip install rapidfuzz works for me in alteryx designer, in your case if possible ask your admin team to install the module rapidfuzz then re-use it,

PassION_es

Hello @binu_acs

The new module runs successfully without errors, and most of the outputs are correct. However, there is still an issue with the vendor names that contain the word “Limited.” As shown in the Vendor Name Input File 2, there are two vendors with “Limited” in their names.

In the actual datasets, there are many similar cases where vendor names include terms like “Corp.”, “Inc.”, “LTD”, “Limited”, etc.

To ensure the script produces unique and accurate matches, can you include an additional preprocessing step before the Python module runs? Specifically:

👉 Remove these common legal suffixes from Input 1 (Desc 1–Desc 2 columns) so that the remaining text becomes unique.

If these suffixes are removed prior to processing the python script, the remaining characters in Desc 1–Desc 2 should be distinct enough for the logic to return the full and correct results. Thanks

Screenshot_difflib_output.png

binu_acs

Can you provide a list of those names with “Corp.”, “Inc.”, “LTD”, “Limited”, etc.

PassION_es

hi @binu_acs

No need — I was able to handle it by using a multi‑field formula to remove those common suffixes. 😊

Thanks so much for your help! I think it’s finally time for me to start learning Python when I’m back from my leave.

SC_MultiFieldFormula.png

binu_acs

@PassION_es Try now

Fuzzy_match_20260316_Sequence_Match.yxmd

flying008

Hi, @PassION_es

After testing, your workflow produces correct results here, whether AMP mode is enabled or disabled.

录制_2026_03_21_09_29_27_574.gif

PassION_es

Hello @binu_acs ,

I tried to embed the python in the actual workflow. It never stops running, was it because of the change in input files used? Attached is the complete yxmd file for your reference, without the input files so you can see the whole picture of my workflow. I won't be able to share with you the package as it is prohibited by our company due to some restrictions. Python tool is located at the bottom right corner of the processing tool.

cost center_fuzzy_.yxmd

PassION_es

Hi @flying008 ,

Thanks for accomodating. Upon checking, still it produced incorrect outputs (vendor names)when using larger datasets with thoursands of lines, some lines (desc 1-2) are similar to each other. See sample screenshots attached.

The last two vendor names are correct, but others are not their vendor names.

input_output incorrect.png

flying008

Hi, @PassION_es

Please upload the real source data of your incorrect result. If you did not upload it, I cannot reproduce the scene of the error for figure out the point.

WirkKarl

Fuzzy Match alone might struggle if the vendor name is buried inside longer text. I’d suggest first using a Regex tool to extract likely vendor patterns, then run Fuzzy Match against your clean list.

You can also experiment with different match styles (e.g. Name vs Address) and adjust match thresholds — that can make a big difference depending on how noisy your data is.

PassION_es

Hello @binu_acs ,

Please see attached INPUT FILE 1 Fields with two dummy rows for your reference. The INPUT Files 2 remains the same but with around 600 vendor names only. Thanks in advance.

Dummy_input1_FUZZY.xlsx

binu_acs

@PassION_es try the updated workflow

Fuzzy_match_20260323_Sequence_Match.yxmd

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15866

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703