Hi Team.
I need some assistance,
I am performing fuzzy matching to find potential duplicates in our system, whilst analyzing the results i have noticed that in some instances its giving me false positives which i need to identify.
To eliminate these false positives i was thinking about creating an additional column to call out that its not a duplicate based on certain rules.
Using the the example Below the logic should work as follows ;
NB :
Group | Name | ID | Source | Name | Outcome |
8650 | Mason LTD | 11111 | FINRA | CRD | Not a duplicate |
8650 | Mason LTD | 22222 | FINRA | CRD | Not a duplicate |
8651 | Amazon Ltd | 33333 | FCA | FCA | Not a duplicate |
8651 | Amazon Ltd | 33334 | FCA | FCA | Not a duplicate |
8652 | Alteryx PLC | 11111 | FINRA | CRD | |
8652 | Alteryx PLC | ||||
8653 | Tesla Ltd | 11111 | FINRA | CRD | |
8653 | Tesla Ltd | 33333 | FCA | FCA | |
8654 | Costa | 11111 | FINRA | CRD | |
8654 | Costa | ||||
8654 | Costa | 33333 | FCA | FCA |
Looking forward to your response.
Kind regards
Masond3
Solved! Go to Solution.
Hi @Masond3 - If you can define rules, you can then take the "Not a duplicate" records out before or after the Fuzzy Match tool using a simple Join. However, the example that you provided does not look as a use case for Fuzzy Matching.
@ArtApa my matching is based on name and address, In the example above ., the matching identified these as a duplicate as the name and address is the same, however it had a different source and ud , and that’s why I want the formula to tell me after the algorithm to run thats it’s nit a match
Hi @Masond3 - If I understood you correctly, here is how your solution may look like:
Please check 8653. It looks like it's "Not a duplicate".
@ArtApa Thank you for providing this example. Looking at 8653 the correct answer is in Column "Outcome" ,
Why is this correct : as 8653 have a count of 2, but there 2 different sources, then these need two records need to be compared, therefore there shouldnt be a value populated in "outcome 2"