Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Fuzzy Match Matchkey

spencer046
8 - Asteroid

Hello,

 

I have the below dataset after fuzzy matching.

Question 1: Through record id 563-565, or 566-569, 574-577, why there are multiple keys generated for same match type?

 

shahed_sheikh_0-1610157986990.png

 

Question 2: After the fuzzy match, I did Make group and found below result based on the above fuzzy match. The Fuzzy Match shows that record 570 and 571 generates the same MatchKey as 563-564, 567-568, so why MIELE BELTMANN is not found as a "Key" with the others for MIELE BELTMAN RELCOATION "Group" below?

 

shahed_sheikh_1-1610158294982.png

 

Question 3: Is it possible to identify what are the MatchKeys for record 241-248 (either unique or multiple)?

 

Any explanation would be greatly appreciated. 

3 REPLIES 3
randreag
11 - Bolide

hi @spencer046 

 

I've had the opportunity of use fuzzy many times I tried to explain how the keys works

 

In the tool you can choose 2 algorithm, one to create the keys and other for the matching key . the algorithm for the key is mandatory while the matching is not.

 

When you choose the same field that you are analizing as the key this generates multiple keys even if the word just differs for one letter. So the recommendation is not to use the same field as the key and as the field to find the siimlarity, create one for the key if necessary (sometimes I have used a constant).

 

When you used the same field as a key ,this takes the key and used in the matching algorithm(what I have noticed), while if you choose differents fields the field you are analyzing capture the similarity.

 

This is a very similar problem Fuzzy-Match-for-Finding-duplicate-Invoice-numbers 

If you want you can upload your data and I can help clarifying what I just explained 

 

Regards

 

 

 

spencer046
8 - Asteroid

@randreag The names I am currently using are unique (in cases 1-2 letter difference, this close), but in the fuzzy match, how do I ensure that a record does not match itself and therefore generate a key?

 

Also, if several names generate the same key, some generates match score (like below), then why they are not grouped as one, when fed through the Group tool?

shahed_sheikh_0-1610211913293.png

shahed_sheikh_1-1610211942829.png

 

randreag
11 - Bolide

HI @spencer046 

 

In the example I attached, there is not comparison with itself.

 

The second question I don't really undertand it because if you need to group it , should be just for the key and maybe a count of the similarities

 

I hope it helps

 

 

Labels