Hi
I have problem with the fuzzy match tools. I'm trying to use fuzzy match for names, examples as below:
A | B |
ASMAYATI ABDULAZIM | ASMAYATI BINTI ABDULAZIM |
AZIZANG BIN IBRAHIM | AZIZANG IBRAHIM |
CANDAY LOH FOONG MANG | LOH FOONG MANG |
However current settings, pick up wrongly for example like below:
Current setting Match threshold 90% with match style: Name
C | D |
CHIAH YUN CHING | CHIA SHYAN CHING |
LOKE HENG FATTY | LEE CHONG FATTY |
NURA ALIYANA BINTI MOHAMMADAD | NURA LIYANA BINTI MUHAMADAD |
WUN PEI KENG | WUN PEI KANG |
Is there a way to fix this?
Hi @alkafalhas - I don't believe this is achievable on the level that you described. The problem is that something that you describe as a "wrong" match is objectively a better match. I tried different Match Functions and "wrongs" always tend to have a better matching score:
I'd recommend to look at other features in your data set or on the process holistically to achieve a desired outcome. Less elegant idea would be to create a table of exceptions and use it to remove "wrongs" from the workflow before Fuzzy Matching.
have you considered parsing out the name into components and then trying a fuzzy match? I think the way it searches it gives equal weight to variation at different point in the name - however, as you've pointed out differing sections of the name may be more important. In your case, you'd probably want a 100% match on "last name" and differing levels of fuzzy match on the other parts.