Fuzzy Match Output
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Experts
I had this sample workflow downloaded to understand Fuzzy Match
The output is like below
Could someone please clarify why we have multiple rows for a given combination with same match score ? And the count differs - 3, 2, 1 etc ?
Examples - Record ID - 1 - count = 3 ; Record ID - 7 - count = 2 ; Record ID - 10 - count = 1
Thanks,
Umashankar
- Labels:
- Behavior Analysis
- Developer
- Fuzzy Match
- Output
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Based on your sample:
1 | Annabel Gillingham | 406 Campfire Street |
20 | Annabelle Gillingham | 406 Camp Street |
Fuzzy Match calculates Name & Address scores separately.
Each match gets recorded individually, even if they’re part of the same entity.
If both Name and Address pass the 80% threshold, two separate match records are created.
If a record matches itself (self-matching issue), it may add a third duplicate row.
so..
One row is for the Names match.
One row is for the Address match.
One row is for the overall combined score.
hope this helps!
and i was wondering if can share this publicly. This is a great sample data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks
But even if remove 'Address' in the fuzzy match and just match based on 'Name' alone, I am getting the same result
i.e., 3 , 2 or 1 rows in the result ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hi! @umashankar_cns
from how i see it, the fuzzy match decision is based on the fields you chose to match ( name & address).
if you take your record id#1 as a sample:
1 | Annabel Gillingham | 406 Campfire Street | Input A |
20 | Annabelle Gillingham | 406 Camp Street | Input B |
your configuration's output is:
1 | 20 | 92 | 96 | 87 |
1 | 20 | 92 | 96 | 87 |
1 | 20 | 92 | 96 | 87 |
now if we only have "Name" as the field on the Match Fields, your result is:
1 | 20 | 96 | 96 |
1 | 20 | 96 | 96 |
1 | 20 | 96 | 96 |
still three records, same as the original output.
Now let's try the address:
1 | 20 | 87 | 87 |
now we have one.
so it's the configuration for the "Names" that needs fixing.
Going back to the fuzzy match that only have "names" for the match"
You’re using DblMetaphone for Name Matching, Alteryx may be generating multiple phonetic keys for "Annabel Gillingham" and "Annabelle Gillingham". Each phonetic key can create a separate record in the match output.
I modified your settings for the match field ( Edit Match Option ) and changed "DblMetaphone " to none and and under match Functions I chose " Character ( No Spaces) : Levenshtein Distance:
1 | 20 | 90 | 90 |
2 | 17 | 87 | 87 |
3 | 12 | 93 | 93 |
4 | 21 | 100 | 100 |
5 | 16 | 85 | 85 |
6 | 13 | 90 | 90 |
7 | 14 | 94 | 94 |
8 | 19 | 92 | 92 |
9 | 22 | 94 | 94 |
10 | 18 | 93 | 93 |
comparing my output to your original N&A match fields your output did not provide IDs 4 & 9, while mine did not show 11.
if i apply my "Names" field configuration to your N&A it will show a lesser result.
So I configure the address field as well:
Generate Keys: None
Match Function: Words: Jaro Distance
Output:
1 | 20 | 89 | 90 | 87 |
2 | 17 | 90 | 87 | 94 |
3 | 12 | 92 | 93 | 92 |
4 | 21 | 99 | 100 | 97 |
5 | 16 | 88 | 85 | 92 |
6 | 13 | 90 | 90 | 90 |
7 | 14 | 97 | 94 | 100 |
8 | 19 | 93 | 92 | 94 |
9 | 22 | 96 | 94 | 98 |
10 | 18 | 96 | 93 | 100 |
Levenshtein Distance vs DblMetaphone: When to Use Each?
Use DblMetaphone when comparing names, addresses, and phonetic variations.
Use Levenshtein Distance when handling typos, spelling errors, and text-based differences.
For Best Accuracy: Combine both methods for strong fuzzy matching in Alteryx.
try doing a look up ( join tool ) to filter unmatched possible matches to make ID# 11 show.
I hope this somehow clear some stuff..
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You are welcome!
I hope I gave you the resolution you need ..
I hope it okay I'll be borrowing your data case.. thanks!
