Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Fuzzy Match Output

umashankar_cns
6 - Meteoroid

Hi Experts

I had this sample workflow downloaded to understand Fuzzy Match

The output is like below

fuzzy_match_output.png

 

Could someone please clarify why we have multiple rows for a given combination with same match score ? And the count differs - 3, 2, 1 etc ? 

Examples - Record ID - 1 - count = 3 ; Record ID - 7 - count = 2 ; Record ID - 10 - count = 1

 

Thanks,

Umashankar

5 REPLIES 5
shancmiralles
11 - Bolide

HI @umashankar_cns 

Based on your sample:

RecordID Name Address
1Annabel Gillingham406 Campfire Street
20Annabelle Gillingham406 Camp Street

 

Fuzzy Match calculates Name & Address scores separately.

Each match gets recorded individually, even if they’re part of the same entity.

If both Name and Address pass the 80% threshold, two separate match records are created.

If a record matches itself (self-matching issue), it may add a third duplicate row.

 

so..

One row is for the Names match.

One row is for the Address match.

One row is for the overall combined score.

 

hope this helps!

and i was wondering if can share this publicly.  This is a great sample data.

 

umashankar_cns
6 - Meteoroid

Thanks

But even if remove 'Address' in the fuzzy match and just match based on 'Name' alone, I am getting the same result 

i.e., 3 , 2 or 1 rows in the result ?

shancmiralles
11 - Bolide

hi! @umashankar_cns 

from how i see it, the fuzzy match decision is based on the fields you chose to match ( name & address).
if you take your record id#1 as a sample:

1Annabel Gillingham406 Campfire StreetInput A
20Annabelle Gillingham406 Camp StreetInput B

 

your configuration's output is:


120929687
120929687
120929687

 

now if we only have "Name" as the field on the Match Fields, your result is:


1209696
1209696
1209696

 

still three records, same as the original output.

 

Now let's try the address:


1208787

 

now we have one.

 

so it's the configuration for the "Names" that needs fixing.

 

Going back to the fuzzy match that only have "names" for the match"

 

You’re using DblMetaphone for Name Matching, Alteryx may be generating multiple phonetic keys for "Annabel Gillingham" and "Annabelle Gillingham". Each phonetic key can create a separate record in the match output.

 

I modified your settings for the match field ( Edit Match Option )  and changed "DblMetaphone " to none and  and under match Functions I chose " Character ( No Spaces) : Levenshtein Distance:

1209090
2178787
3129393
421100100
5168585
6139090
7149494
8199292
9229494
10189393

 

comparing my output to your original N&A match fields your output did not provide IDs 4 & 9, while mine did not show 11.

 

if i apply my "Names" field configuration to your N&A it will show a lesser result. 

 

So I configure the address field as well:

Generate Keys: None

Match Function: Words: Jaro Distance

 

Output:


120899087
217908794
312929392
4219910097
516888592
613909090
7149794100
819939294
922969498
10189693100

 

 



Levenshtein Distance vs DblMetaphone: When to Use Each?
Use DblMetaphone  when comparing names, addresses, and phonetic variations.
Use Levenshtein Distance when handling typos, spelling errors, and text-based differences.
For Best Accuracy: Combine both methods for strong fuzzy matching in Alteryx.

 

try doing a look up ( join tool ) to filter unmatched possible matches to make ID# 11 show. 

I hope this somehow clear some stuff..

 

Thanks!

 

 

 

 

umashankar_cns
6 - Meteoroid

Thank you !

shancmiralles
11 - Bolide

@umashankar_cns 

You are welcome!

I hope I gave you the resolution you need ..

I hope it okay I'll be borrowing your data case.. thanks!

Labels
Top Solution Authors