Hello,
I have the below dataset after fuzzy matching.
Question 1: Through record id 563-565, or 566-569, 574-577, why there are multiple keys generated for same match type?
Question 2: After the fuzzy match, I did Make group and found below result based on the above fuzzy match. The Fuzzy Match shows that record 570 and 571 generates the same MatchKey as 563-564, 567-568, so why MIELE BELTMANN is not found as a "Key" with the others for MIELE BELTMAN RELCOATION "Group" below?
Question 3: Is it possible to identify what are the MatchKeys for record 241-248 (either unique or multiple)?
Any explanation would be greatly appreciated.
hi @spencer046
I've had the opportunity of use fuzzy many times I tried to explain how the keys works
In the tool you can choose 2 algorithm, one to create the keys and other for the matching key . the algorithm for the key is mandatory while the matching is not.
When you choose the same field that you are analizing as the key this generates multiple keys even if the word just differs for one letter. So the recommendation is not to use the same field as the key and as the field to find the siimlarity, create one for the key if necessary (sometimes I have used a constant).
When you used the same field as a key ,this takes the key and used in the matching algorithm(what I have noticed), while if you choose differents fields the field you are analyzing capture the similarity.
This is a very similar problem Fuzzy-Match-for-Finding-duplicate-Invoice-numbers
If you want you can upload your data and I can help clarifying what I just explained
Regards
@randreag The names I am currently using are unique (in cases 1-2 letter difference, this close), but in the fuzzy match, how do I ensure that a record does not match itself and therefore generate a key?
Also, if several names generate the same key, some generates match score (like below), then why they are not grouped as one, when fed through the Group tool?
HI @spencer046
In the example I attached, there is not comparison with itself.
The second question I don't really undertand it because if you need to group it , should be just for the key and maybe a count of the similarities
I hope it helps
User | Count |
---|---|
18 | |
14 | |
13 | |
9 | |
8 |