I am creating a workflow that will match up two transactional files on Account Id and Amount, then identify if the transaction was found in both (Duplicate) or not. So far I am using the Join Tool to 'join' the two tables on Account Id and Amount, the Left and Right out going anchors are giving me the truly unique records that I need, but the Join Anchor is where my problem is arising.
The Join Anchor is giving me records where the account is truly in both tables and thus a "duplicate", but I am also getting records where the account is in one table with 2 rows of the same amount and in the other table with a single row of that same amount.
Think something like this:
Table #1
V_ID | V_Act | V_Amt |
1 | 123 | 100 |
2 | 123 | 100 |
3 | 234 | 50 |
Table #2
L_ID | L_Act | L_Amt |
7 | 123 | 100 |
8 | 234 | 50 |
9 | 234 | 50 |
After Join
V_ID | V_Act | V_Amt | L_ID | L_Act | L_Amt |
1 | 123 | 100 | 7 | 123 | 100 |
2 | 123 | 100 | 7 | 123 | 100 |
3 | 234 | 50 | 8 | 234 | 50 |
3 | 234 | 50 | 9 | 234 | 50 |
I have created unique row ids for each record to identify if the row is being duplicated from the join tool, below is an example of the type of results my Join Anchor is providing.
V_Str | L_Str |
97 | 544 |
98 | 544 |
149 | 316 |
156 | 316 |
221 | 94 |
224 | 94 |
521 | 64 |
521 | 65 |
522 | 64 |
522 | 65 |
561 | 858 |
562 | 858 |
615 | 106 |
616 | 106 |
622 | 257 |
622 | 258 |
757 | 25 |
As you can see the L_Str row 544 is being joined to the V_Str rows 97 and 98. In this case V_Str row 97 and L_Str row 544 are consider a "duplicate" and V_Str row 98 would need to be kept to continue through the process.
In another case V_Str row 622 is being joined with L_Str rows 257 and 258. In this case L_Str row 257 and V_Str row 622 are consider a "duplicate" and L_Str row 258 would need to be kept to continue through the process.
There are also cases like V_Str rows 521 and 522 which are both joining individually to L_Str rows 64 and 65. In this case both V_Str rows 521 and 522 AND L_Str rows 64 and 65 are considered 'duplicate'.
Finally in a case like V_Str row 757, this record would be a straight forward duplicate so V_Str 757 and L_Str 25 should be considered "duplicate".
I am hoping that someone can come up with a good solution for this problem.
TIA
@brianfia , I may be understanding the problem incorrectly, but why not just union the two tables and then do a unique tool based on Account and Amount to find the values? see screenshot and attached workflow.
I may not have explained it well enough. Basically we are getting transactions from two different systems, A and B, but only transactions in A are fulfilled by A while System B contains some but not all the same transactions as system A and also contains unique transactions to system B.
Another way to think of it is:
System A fulfills orders and keeps a list of the transactions that it fulfills.
System B sends transactions to A to fulfill but also fulfills orders on its own. The transactions that it sends to A are the ones that I need to eliminate, but there are no identifiers that state these were sent to A.
I need to get a list of all orders fulfilled
Certain transactions can reside in both systems and other than account id and dollar amount of the transaction, there are no other unique fields.
** I am working to get more uniquely identifiable fields, but in the interim I need to find a solution for what I am currently given.
Take your system A as the left join table, system B as the right. Then L and R anchor should give you distinct orders.
You problem is handle the inner join. Since only B sends to A, if there is a match, then any matching ID from B will give you a list of things you can drop from B table.