Hi guys,
I got this difficult but also really interesting scenario and would love to hear your advise on it.
The goal: Finding out all the duplicates and remove them based on their rating score.
The data set looks kinda like this:
ClientID | Rate |
00001 | 8 |
00002 | 5 |
00003 | 4 |
00004 | 8 |
00005 | 6 |
00006 | 7 |
00007 | 7 |
00008 | 7 |
00009 | 0 |
00010 | 2 |
00011 | 5 |
After using the Fuzzy match and Unique tool to filter out the duplicates, we got this matching result:
ClientID 1 | ClientID2 | Rate ClientID 1 | Rate ClientID2 |
00001 | 00002 | 8 | 5 |
00001 | 00003 | 8 | 4 |
00002 | 00003 | 5 | 4 |
00004 | 00007 | 8 | 7 |
00005 | 00006 | 6 | 7 |
00008 | 00009 | 7 | 0 |
00008 | 00010 | 7 | 2 |
00008 | 00011 | 7 | 5 |
00009 | 00010 | 0 | 2 |
00009 | 00011 | 0 | 5 |
00010 | 00011 | 2 | 5 |
So now we know all the potential matching pairs. In each group, client with highest score will be survival, the rest will be deleted.
Eg: in group of 00001, 00002, 00003, since 00001 has highest rate, it will be the survival and 00002, 00003 will be deleted.
=> Expected outcome
Survival | Duplicates |
00001 | 00002, 0003 |
00004 | 00007 |
00006 | 00005 |
00008 | 00009, 00010, 00011 |
Any idea on how to achieve this?
Thank you so so much in advance!!!
@DustinNg_
Looks like an interesting topic.
I think it can be done with an iterative macro
Would you like to give it a try? 😉
@DustinNg_
I would like to, just having a busy day. maybe in the weekend.
before that, I am sure someelse would offer a better one.
I realized that it can be solved quite effectively by importing Python into the workflow and do all the coding there. That gives us lots of flexibility.
However, still would love to hear you guys advise on how we can solve it by using Alteryx purely.