Hi
I have problem with the fuzzy match tools. I'm trying to use fuzzy match for names, examples as below:
| A | B |
| ASMAYATI ABDULAZIM | ASMAYATI BINTI ABDULAZIM |
| AZIZANG BIN IBRAHIM | AZIZANG IBRAHIM |
| CANDAY LOH FOONG MANG | LOH FOONG MANG |
However current settings, pick up wrongly for example like below:
Current setting Match threshold 90% with match style: Name
| C | D |
| CHIAH YUN CHING | CHIA SHYAN CHING |
| LOKE HENG FATTY | LEE CHONG FATTY |
| NURA ALIYANA BINTI MOHAMMADAD | NURA LIYANA BINTI MUHAMADAD |
| WUN PEI KENG | WUN PEI KANG |
Is there a way to fix this?
Hi @alkafalhas - I don't believe this is achievable on the level that you described. The problem is that something that you describe as a "wrong" match is objectively a better match. I tried different Match Functions and "wrongs" always tend to have a better matching score:
I'd recommend to look at other features in your data set or on the process holistically to achieve a desired outcome. Less elegant idea would be to create a table of exceptions and use it to remove "wrongs" from the workflow before Fuzzy Matching.
have you considered parsing out the name into components and then trying a fuzzy match? I think the way it searches it gives equal weight to variation at different point in the name - however, as you've pointed out differing sections of the name may be more important. In your case, you'd probably want a 100% match on "last name" and differing levels of fuzzy match on the other parts.
