Hello community,
I'm trying to work on a problem where I want to join two tables on multiple primary fields such that the values are not repeating after the join. For example, in the sample workflow attached, I have two input files having id, name, state, account, month-year, year, and sales as their columns. If I join the two tables on id, state, and account, the records are getting duplicated from the left table for each sales value in the right table, specifically for the names that are common in both the tables (For example, Jim and Erica). However, I want my output in this format such that there are no duplication of records:
Id | Name | State | Account | Month-Year | Sales | Right_Month-Year | Sales 2 |
3 | Chuck | PA | 19 | 2022-03 | 36 | 2022-03 | 35 |
3 | Chuck | PA | 19 | 36 | 2021-01 | 17 | |
3 | Chuck | PA | 19 | 36 | 2021-03 | 19 | |
3 | Chuck | PA | 19 | 36 | 2021-07 | 22 | |
3 | Chuck | PA | 19 | 36 | 2021-08 | 24 | |
3 | Chuck | PA | 19 | 36 | 2022-01 | 27 | |
3 | Chuck | PA | 19 | 36 | 2022-02 | 30 | |
3 | Chuck | PA | 19 | 36 | 2022-04 | 32 | |
3 | Chuck | PA | 112 | 2022-04 | 41 | ||
5 | Jim | TX | 87 | 2021-01 | 54 | ||
5 | Jim | TX | 87 | 2021-02 | 34 | ||
5 | Jim | TX | 87 | 2021-03 | 25 | ||
5 | Jim | TX | 87 | 2021-04 | 32 | ||
5 | Jim | TX | 87 | 2021-05 | 24 | ||
5 | Jim | TX | 87 | 2021-06 | 27 | ||
5 | Jim | TX | 87 | 2021-07 | 29 | ||
5 | Jim | TX | 87 | 2021-09 | 38 | ||
5 | Jim | TX | 87 | 2021-10 | 42 | ||
5 | Jim | TX | 87 | 2021-11 | 43 | ||
5 | Jim | TX | 87 | 2021-12 | 49 | ||
5 | Jim | TX | 87 | 2022-01 | 50 | 2022-01 | 43 |
5 | Jim | TX | 87 | 2022-02 | 56 | 2022-02 | 56 |
5 | Jim | TX | 87 | 2022-03 | 57 | 2022-03 | 35 |
5 | Jim | TX | 87 | 2022-04 | 59 | 2022-04 | 44 |
5 | Jim | TX | 87 | 2022-05 | 62 | 43 | |
7 | Erica | MN | 72 | 2021-06 | 15 | 59 | |
7 | Erica | MN | 72 | 2021-09 | 16 | 59 | |
7 | Erica | MN | 72 | 2022-01 | 19 | 2022-01 | 59 |
7 | Erica | MN | 72 | 2022-02 | 20 | ||
7 | Erica | MN | 72 | 2022-03 | 22 | ||
7 | Erica | MN | 72 | 2022-04 | 23 | 2022-04 | 70 |
7 | Erica | MN | 72 | 2022-05 | 24 | ||
11 | Heather | UT | 33 | 2022-01 | |||
11 | Heather | UT | 33 | 2022-02 | |||
12 | Amara | NV | 49 | 2022-04 | |||
12 | Amara | NV | 92 | 2022-01 | |||
12 | Amara | NV | 92 | 2022-02 | |||
12 | Amara | NV | 170 | 2022-02 | |||
12 | Amara | NV | 170 | 2022-03 |
Can someone please help me how to approach this problem?
Hey @keeprollin,
What logic do you want to use to remove duplicate records. For instance, Erica has these three:
7 | Erica | MN | 72 | 2021-06 | 15 | 59 | |
7 | Erica | MN | 72 | 2021-09 | 16 | 59 | |
7 | Erica | MN | 72 | 2022-01 | 19 | 2022-01 | 59 |
Should we just take the first record? Or do we need to sum the sale or anything like that?
If you just want the first record you can use the sample tool before hand:
This will ensure there is only 1 id, state and account number to join on.
Any questions or issues please ask :)
HTH!
Ira
@keeprollin - I added 'Month-Year' to the Join fields, then brought the Left side output into the Union, and finally added a sort. The repeating values are gone!
Solution attached. Please mark this solution as 'accepted' if it works for you! Or let me know if you need more help :-)