Alteryx Designer Desktop Discussions

DustinNg_ · ‎01-05-2022

Hi guys,

I got this difficult but also really interesting scenario and would love to hear your advise on it.

The goal: Finding out all the duplicates and remove them based on their rating score.

The data set looks kinda like this:

ClientID	Rate
00001	8
00002	5
00003	4
00004	8
00005	6
00006	7
00007	7
00008	7
00009	0
00010	2
00011	5

After using the Fuzzy match and Unique tool to filter out the duplicates, we got this matching result:

ClientID 1	ClientID2	Rate ClientID 1	Rate ClientID2
00001	00002	8	5
00001	00003	8	4
00002	00003	5	4
00004	00007	8	7
00005	00006	6	7
00008	00009	7	0
00008	00010	7	2
00008	00011	7	5
00009	00010	0	2
00009	00011	0	5
00010	00011	2	5

So now we know all the potential matching pairs. In each group, client with highest score will be survival, the rest will be deleted.

Eg: in group of 00001, 00002, 00003, since 00001 has highest rate, it will be the survival and 00002, 00003 will be deleted.

=> Expected outcome

Survival	Duplicates
00001	00002, 0003
00004	00007
00006	00005
00008	00009, 00010, 00011

Any idea on how to achieve this?

Thank you so so much in advance!!!

Qiu · ‎01-05-2022

@DustinNg_
Looks like an interesting topic.
I think it can be done with an iterative macro

DustinNg_ · ‎01-05-2022

Would you like to give it a try? 😉

Qiu · ‎01-05-2022

@DustinNg_
I would like to, just having a busy day. maybe in the weekend.
before that, I am sure someelse would offer a better one.

DustinNg_ · ‎01-06-2022

I realized that it can be solved quite effectively by importing Python into the workflow and do all the coding there. That gives us lots of flexibility.

However, still would love to hear you guys advise on how we can solve it by using Alteryx purely.

Alteryx Designer Desktop Discussions

Separate unique and duplicates based on particular condition

Re: Macro not Looping thru Files in Folder

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data