Hello experts,
My Left table has 77MB and Right Table has 27MB, i am trying to pull the maching records from the Left table(Right Join).
when i checked output its coming more than 6GB.
Doubt on below points please help me
Regards,
Niranjan
Solved! Go to Solution.
Hi @NiranjanK1,
This post explains really well why you see duplicates here:
You are joining the data on the columns that have duplicate positions in both data bases. The easiest way (it isn't best practice) is to use the unique tool in order to remove the positions that are duplicated and keep only one record for the position that you are using in join tool.
In short, you need to clean your data. Remove the duplicate positions and the workflow will work for you.
Please mark my post as a solution if this was helpful for you. If you have more questions please let me know!
@Emil_Kos Thanks for your reply
Both the data sources have same set of fields, if im using Right outer join do i need to un select left table fields. im not clear on below points, please suggest me
Regards,
Niranjan
Hi,
I think you don't need to unselect anything and the problem is something else.
For example, in both files, you are using ID column
ID Column Left Table | ID Column Right Table |
1 | 1 |
1 | 1 |
1 | 1 |
2 | 1 |
2 | 2 |
3 | 2 |
You would expect to join the data and have 6 records but because you have duplicate values in the ID column in both sets you will multiply the results
If you will use unique column and remove the duplicated ID values the workflow will start working for you.
Hi @NiranjanK1,
I have created a workflow for you so it will be easier for me to explain.
In the example provided in a previous post, there have been duplicates in the ID column. That is the reason that we have 16 row of the data generated just from 6 lines:
If I have used the unique column I got only 6 output lines as there are duplicates only in one of the data sources. I achieved it by using a unique tool.
I hope it is easier to understand.
@Emil_Kos thank you, Now I got better under standing on the concept .
But i have both the set of data source fields in my output. Do i need to un-select the Left table fields. Kindly suggest me. please look have a look at below images.
Hi @NiranjanK1,
Happy to hear that my post helped you to clarify what is happening in a workflow.
Regarding your second question, it is a Really good question. Please click option and click deselect duplicated fields:
Thanks to that you will keep only one column from each data sets and you will not have plenty of duplicated columns.
@Emil_Kos sorry to bother you again, If i use options and click deselect duplicated fields.
It is removing the data from right table. But im using Right OuterJoin. Do i need to remove the the fields from the left table.
Hi @NiranjanK1,
It will not matter the output will be exactly the same. Unless the column names differ and you want to keep the column names from the right table.
If that is the case I would suggest switching left and right connection:
@Emil_Kos thanks a lot, Yes make sense i tested as well, got the expected output.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
6 |