Hi All,
I have a set of records and would like to remove duplicate values based on two columns.
Document No | Invoice No |
123 | 1 |
456 | 1 |
123 | 2 |
456 | 2 |
The correct set of records to be returned should be:
Document No | Invoice No |
123 | 1 |
456 | 2 |
In other words, the invoice number cannot be repeated. If Invoice Number 1 has already matched to Document No 123, then it should not be matched again to Document No 456.
How can I do this in Alteryx? Using the unique tool either on both columns or one of the columns does not work.
Solved! Go to Solution.
Is it always going to be the case that you will have a matching number of unique Document and Invoice numbers? If that's the case you'd probably want to do something like this
If that's not the case, what Invoice no will you use?
Hi @timothyyeo mocked something up that produces the output. Would need to be tested against you data to ensure it would work.
Hi @MichalM,
Thanks for the fast reply. However, the outcome I need is:
Doc No | Inv No |
123 | 1 |
456 | 2 |
However, the outcome of your method is returning
Doc No | Inv No |
123 | 1 |
456 | 1 |
Effectively, I feel the most complete solution would be to create some sort of For or While loop in alteryx, to identify that if an Invoice No is already tagged to a Doc No, it should not be tagged to another different Doc No.
Realised that and edited the post. A for loop or iterative macro in Alteryx's terms could be a solution depending on what you want to do if you have an extra Document No and no un-used Invoice No to use.
@timothyyeo I have just updated my post above to include the iterative macro.
Sorry about leaving it blank, I fell foul of the same thing @MichalM did so needed to build a slightly more complex solution to deal with any further duplicates you have within your data
Hi @MichalM ,
Yes, good question and I should clarify. The dataset I'm working with may not have a matching set of unique DocNos and InvNos. In those situations, the first match will be taken as correct.
DocNo | InvNo |
123 | 1 |
456 | 1 |
For the above table, (123, 1) will be taken as correct. Ideally, the second record (456, 1) is to be flagged in separate table for further investigation.
DocNo | InvNo |
123 | 1 |
456 | 1 |
123 | 2 |
456 | 2 |
123 | 3 |
456 | 3 |
For this table above, records (123, 1) and (456, 2) will be taken as correct.
(123, 3) and (456, 3) are to be flagged separately for investigation.
(456, 1) and (123, 2) are duplicates and simply discarded since both DocNos and InvNos are already matched correctly.
Hi @JoeS ,
I tried to run your flow with a tweak sample dataset:
But the output does not seem correct (screenshots of the results of O Output and I Output below).
The correct output should be (123, 1), (456, 2), and (789, 3). As its my first time seeing an iterative macro in practice, is there anything additional I should do to run this workflow correctly?