Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Identifying and Removing Duplicate Transactions

brianfia
8 - Asteroid

I am creating a workflow that will match up two transactional files on Account Id and Amount, then identify if the transaction was found in both (Duplicate) or not. So far I am using the Join Tool to 'join' the two tables on Account Id and Amount, the Left and Right out going anchors are giving me the truly unique records that I need, but the Join Anchor is where my problem is arising.

 

The Join Anchor is giving me records where the account is truly in both tables and thus a "duplicate", but I am also getting records where the account is in one table with 2 rows of the same amount and in the other table with a single row of that same amount. 

 

Think something like this:

Table #1

V_IDV_ActV_Amt
1123100
2123100
323450

 

 

Table #2

L_IDL_ActL_Amt
7123100
823450
923450

 

After Join

V_IDV_ActV_AmtL_IDL_ActL_Amt
11231007123100
21231007123100
323450823450
323450923450


I have created unique row ids for each record to identify if the row is being duplicated from the join tool, below is an example of the type of results my Join Anchor is providing.

 

V_StrL_Str
97544
98544
149316
156316
22194
22494
52164
52165
52264
52265
561858
562858
615106
616106
622257
622258
75725

 

As you can see the L_Str row 544 is being joined to the V_Str rows 97 and 98. In this case V_Str row 97 and  L_Str row 544  are consider a "duplicate" and V_Str row 98 would need to be kept to continue through the process.

 

In another case V_Str row 622 is being  joined with L_Str rows 257 and 258. In this case L_Str row 257 and V_Str row 622  are consider a "duplicate" and L_Str row 258 would need to be kept to continue through the process.

 

There are also cases like V_Str rows 521 and 522 which are both joining individually to L_Str rows 64 and 65. In this case both V_Str rows 521 and 522 AND L_Str rows 64 and 65 are considered 'duplicate'.

 

Finally in a case like V_Str row 757, this record would be a straight forward duplicate so V_Str 757 and L_Str 25 should be considered "duplicate".

 

I am hoping that someone can come up with a good solution for this problem.

TIA

4 REPLIES 4
AndrewSu
Alteryx
Alteryx

@brianfia , I may be understanding the problem incorrectly, but why not just union the two tables and then do a unique tool based on Account and Amount to find the values?  see screenshot and attached workflow. 

 

AndrewSu_0-1656112335150.png

 

brianfia
8 - Asteroid

I may not have explained it well enough. Basically we are getting transactions from two different systems, A and B, but only transactions in A are fulfilled by A while System B contains some but not all the same transactions as system A and also contains unique transactions to system B. 

 

Another way to think of it is:

System A fulfills orders and keeps a list of the transactions that it fulfills. 

System B sends transactions to A to fulfill but also fulfills orders on its own. The transactions that it sends to A are the ones that I need to eliminate, but there are no identifiers that state these were sent to A.

 

I need to get a list of all orders fulfilled

 

Certain transactions can reside in both systems and other than account id and dollar amount of the transaction, there are no other unique fields.

 

** I am working to get more uniquely identifiable fields, but in the interim I need to find a solution for what I am currently given.

Amol_Telore
11 - Bolide

This solution will exclude those transactions which are present in both the systems and will only consider unique transaction from each of them. Hope this answers your query.

Amol_Telore_0-1657199535371.png

 

LiuZhang
9 - Comet

Take your system A as the left join table, system B as the right. Then L and R anchor should give you distinct orders.

You problem is handle the inner join. Since only B sends to A, if there is a match, then any matching ID from B will give you a list of things you can drop from B table.

Labels