I built a fuzzy match service which compares the description from 2019 against 2020 and matches with the closest possible description. The description consists of alphanumeric characters. eg: in the 2019 data one of the description is as - GLOBULIN 20 GM/200 ML. It is matching with the 2020 description which is GLOBULIN 20 GM/400 ML. How can I make sure that the alphabets can be 95% match but the digits should be a 100% match. 'GLOBULIN 20 GM/200 ML' in 2019 should match with 'GLOBULIN 20 GM/200 ML' in 2020.
I'm using the below settings
Am I supposed to use Double Metaphone instead of Double Metaphone w/Digits? Am I using the right Match function?
The expected output should be only the second row in yellow
I believe you can output the 'Match Score' (when configuring your Fuzzy Match) and before you apply the "Unique" building block, you sort your data based on that Match score so it keeps the 100% matched records instead of the 96% one. Hope this helps. Cheers!