Hi Team.
I need some assistance,
I am performing fuzzy matching to find potential duplicates in our system, whilst analyzing the results i have noticed that in some instances its giving me false positives which i need to identify.
To eliminate these false positives i was thinking about creating an additional column to call out that its not a duplicate based on certain rules.
Using the the example Below the logic should work as follows ;
NB :
- Only assess values within a group
- If Count of Group = 2 and Source= FINRA & the IDs are different then "Not a Duplicate"
- If Count of Group = 2 and Source= FCA & the IDs are different then "Not a Duplicate"
| Group | Name | ID | Source | Name | Outcome |
| 8650 | Mason LTD | 11111 | FINRA | CRD | Not a duplicate |
| 8650 | Mason LTD | 22222 | FINRA | CRD | Not a duplicate |
| 8651 | Amazon Ltd | 33333 | FCA | FCA | Not a duplicate |
| 8651 | Amazon Ltd | 33334 | FCA | FCA | Not a duplicate |
| 8652 | Alteryx PLC | 11111 | FINRA | CRD | |
| 8652 | Alteryx PLC | | | | |
| 8653 | Tesla Ltd | 11111 | FINRA | CRD | |
| 8653 | Tesla Ltd | 33333 | FCA | FCA | |
| 8654 | Costa | 11111 | FINRA | CRD | |
| 8654 | Costa | | | | |
| 8654 | Costa | 33333 | FCA | FCA | |
Looking forward to your response.
Kind regards
Masond3