Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Remove duplicate records based on two columns

timothyyeo
7 - Meteor

Hi All,

 

I have a set of records and would like to remove duplicate values based on two columns.

 

Document NoInvoice No
1231
4561
1232
4562

 

The correct set of records to be returned should be:

Document NoInvoice No
1231
4562

 

In other words, the invoice number  cannot be repeated. If Invoice Number 1 has already matched to Document No 123, then it should not be matched again to Document No 456.

 

How can I do this in Alteryx? Using the unique tool either on both columns or one of the columns does not work.

11 REPLIES 11
JoeS
Alteryx
Alteryx

Ok, now with the actual solution.

 

An Iterative Macro.

 

Macro.jpg

 

Whereby we loop round choosing the minimum Invoice No each time, then removing that invoice number and document number from the next iteration.

MichalM
Alteryx
Alteryx

@timothyyeo 

 

Is it always going to be the case that you will have a matching number of unique Document and Invoice numbers? If that's the case you'd probably want to do something like this

 

  • Group by Document No
  • Group by Invoice No
  • Join by record position

invoice-no.png

 

If that's not the case, what Invoice no will you use?

 

JosephSerpis
17 - Castor
17 - Castor

Hi @timothyyeo mocked something up that produces the output. Would need to be tested against you data to ensure it would work.Filter_Two_Columns.JPG

timothyyeo
7 - Meteor

Hi @MichalM,

 

Thanks for the fast reply. However, the outcome I need is:

Doc NoInv No
1231
4562

 

However, the outcome of your method is returning

Doc NoInv No
1231
4561

 

Effectively, I feel the most complete solution would be to create some sort of For or While loop in alteryx, to identify that if an Invoice No is already tagged to a Doc No, it should not be tagged to another different Doc No. 

 

MichalM
Alteryx
Alteryx

Realised that and edited the post. A for loop or iterative macro in Alteryx's terms could be a solution depending on what you want to do if you have an extra Document No and no un-used Invoice No to use.

JoeS
Alteryx
Alteryx

@timothyyeo I have just updated my post above to include the iterative macro.

 

Sorry about leaving it blank, I fell foul of the same thing @MichalM did so needed to build a slightly more complex solution to deal with any further duplicates you have within your data

timothyyeo
7 - Meteor

Hi @MichalM ,

 

Yes, good question and I should clarify. The dataset I'm working with may not have a matching set of unique DocNos and InvNos. In those situations, the first match will be taken as correct.

 

DocNoInvNo
1231
4561

For the above table, (123, 1) will be taken as correct. Ideally, the second record (456, 1) is to be flagged in separate table for further investigation.

 

DocNoInvNo
1231
4561
1232
4562
1233
4563

For this table above, records (123, 1) and (456, 2) will be taken as correct.

(123, 3) and (456, 3) are to be flagged separately for investigation.

(456, 1) and (123, 2) are duplicates and simply discarded since both DocNos and InvNos are already matched correctly.

 

timothyyeo
7 - Meteor

Thanks @JosephSerpis ,

 

The solution has resolved my original problem.

timothyyeo
7 - Meteor

Hi @JoeS ,

 

I tried to run your flow with a tweak sample dataset:

clipboard_image_0.png

 

But the output does not seem correct (screenshots of the results of O Output and I Output below).

clipboard_image_3.png

clipboard_image_2.png

 

The correct output should be (123, 1), (456, 2), and (789, 3). As its my first time seeing an iterative macro in practice, is there anything additional I should do to run this workflow correctly?

Labels