Hi guys,
I got this difficult but also really interesting scenario and would love to hear your advise on it.
The goal: Finding out all the duplicates and remove them based on their rating score.
The data set looks kinda like this:
| ClientID | Rate |
| 00001 | 8 |
| 00002 | 5 |
| 00003 | 4 |
| 00004 | 8 |
| 00005 | 6 |
| 00006 | 7 |
| 00007 | 7 |
| 00008 | 7 |
| 00009 | 0 |
| 00010 | 2 |
| 00011 | 5 |
After using the Fuzzy match and Unique tool to filter out the duplicates, we got this matching result:
| ClientID 1 | ClientID2 | Rate ClientID 1 | Rate ClientID2 |
| 00001 | 00002 | 8 | 5 |
| 00001 | 00003 | 8 | 4 |
| 00002 | 00003 | 5 | 4 |
| 00004 | 00007 | 8 | 7 |
| 00005 | 00006 | 6 | 7 |
| 00008 | 00009 | 7 | 0 |
| 00008 | 00010 | 7 | 2 |
| 00008 | 00011 | 7 | 5 |
| 00009 | 00010 | 0 | 2 |
| 00009 | 00011 | 0 | 5 |
| 00010 | 00011 | 2 | 5 |
So now we know all the potential matching pairs. In each group, client with highest score will be survival, the rest will be deleted.
Eg: in group of 00001, 00002, 00003, since 00001 has highest rate, it will be the survival and 00002, 00003 will be deleted.
=> Expected outcome
| Survival | Duplicates |
| 00001 | 00002, 0003 |
| 00004 | 00007 |
| 00006 | 00005 |
| 00008 | 00009, 00010, 00011 |
Any idea on how to achieve this?
Thank you so so much in advance!!!