Alteryx Designer Desktop Discussions

sanky1990 · ‎05-15-2025

Looking for some assistance here related to designing a logic for last name match using fuzzy match tool- dataset ~1.4M records

Sample table format:

COL-A	COL-B	COL-C	COL-D	COL-E
X	S1	LASTNAME1	S2	LASTNAME10
X	S1	LASTNAME2	S2	LASTNAME2
Y	S3	LASTNAME3	S4	LASTNAME11
Y	S3	LASTNAME3	S4	LASTNAME12
Y	S3	LASTNAME4	S4	LASTNAME13
Z	S5	LASTNAME5	S6	LASTNAME5

Lastnames in input dataset have spelling, character mismatches. So algorithm needed is to identify similarity score using all text-word-character comparisons possible

Need to match COL-C vs COL-E for similarity and need to output the score and key used to map in fuzzy match for each of the input rows. I am trying to setup fuzzy match on this table with custom setting but since dataset has total 1.3M rows , currently it is taking long time and returning 100s of billions of rows from fuzzy match tool output which is inefficient as it is comparing record of each lastname from col C with lastname from Col E. I need it to only run comparison on same row value. for ex. row 1 of sample table- LASTNAME1 should only get compared with LASTNAME10 as it is on same row and not with other row values. So expected output rows from fuzzy match tool should be less than original 1.3M input rows as some of row ideally will not get matched depending on threshold setup and not billion rows will be outputted which i am currently struggling with

Sample Expected result.

COL-A	COL-B	COL-C	COL-D	COL-E	Score	MapKey
X	S1	LASTNAME1	S2	LASTNAME10	Score1	MK1
X	S1	LASTNAME2	S2	LASTNAME2	Score 2	MK2
Y	S3	LASTNAME3	S4	LASTNAME11	Score 3	MK3
Y	S3	LASTNAME3	S4	LASTNAME12	Score 4	MK4
Y	S3	LASTNAME4	S4	LASTNAME13	Score 5	MK5
Z	S5	LASTNAME5	S6	LASTNAME5	Score 6	MK6

Any sample workflow using batch macro-fuzzy match technique or using other matching functions that will help address my issue will be really helpful here

KGT · ‎05-15-2025

So, you just want to go line by line rather than one dataset vs the other. That's not how the Fuzzy Match tool is set up, and to go line by line, I would suggest a formula or something for speed rather than the fuzzy match tool. The tool will do so much more than what you are after and so take longer.

If going to a formula style approach, then ideally the Jaro-Winkler algorithm is the formula you would be looking for. Take a look at this macro, I have not tested or even downloaded it, so no idea on speed etc.

This is the idea for a formula to be added natively (please star it to vote for it): https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Ideas/Levenshtein-and-Jaro-Winkler-Distanc...

And this is a description of the Jaro-Winkler algorithm: https://medium.com/codex/best-libraries-for-fuzzy-matching-in-python-cbb3e0ef87dd#:~:text=all%20huma...

If you would really like to persist with the tool, then try a batch or iterative macro feeding in one line (or unrelated lines) at a time. Take a look at "Help > Sample Workflows > Use Scripting and.... > Build a Macro > Merge to a master file with Fuzzy Matching" for an example using macros. However, that example is not the same as yours, and so will require a lot of stripping out... I would advise looking at that sample, but not using it as your starter.

For the easiest solution of an iterative macro line by line, build an iterative macro with parallel sample tools to split the top row off and then iterate the rest. But 1.3m iterations is going to be hefty.

Take a look at this: https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/FUZZY-MATCH-line-by-line/td-p/...

Alteryx Designer Desktop Discussions

Support for Fuzzy Match

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Re: Chaining Apps

Re: Unable to read in all raw xml from an excel fi...

Unable to read in all raw xml from an excel file