Hello, I am not sure what I am doing wrong. For example purposes I am simplifying my problem.
I have two files:
File 1:
ID | Name | Manager | Status |
1 | James | Annie | Open |
5 | Mark | Tyler | Closed |
File 2:
ID | Number | Stage | NDA |
1 | 100 | Planning | Yes |
2 | 101 | Planning | No |
... | |||
100 | 205 | Complete | Yes |
NOTE: EVERY ID IN FILE 1 EXISTS IN FILE 2
My goal is to find where the ID's in File 1 exist in File 2, and to join their data. However, when I use the J output, I end up with MORE records than there are in file 1.
So in reality, File 1 has 4000 records. File 2 has 9100 records. After using the J output with the join tool, I end up with 5379 records. I have no clue how or why this is happening. I want there to be only 4000 records, the same number in file 1. Here is what my join tool looks like:
PLEASE NOTE: THERE ARE NO DUPLICATES IN FILE 1 OR 2
I have played around with the three L, J and R but still don't get my desired result. What am I doing wrong?
Hi @shaheer
It seems that you have duplicated Project Number`s on your second file, if you did not this would not happen. Try to be sure about that. One way of being sure about it is putting a unique tool after the input tool of each dataset and assign the Project Number column.
EXAMPLE OF DUPLICATE SITUATION
File 1
Project Number |
1 |
2 |
File 2
Project Number |
1 |
1 |
2 |
JOIN Result
Project Number |
1 |
1 |
2 |
Hi @shaheer
What you are doing is right, but the problem is with the dataset.
Breaking down to explain.
File 1 has 4000 records.
File 2 has 9000 records.
But the J anchor needs to output only 4000 records, but it gives 5000+ records.
Because ID as in 1 in file 1 is matched with ID as 1 in File 2 twice.
So leading to increase in count.
Many thanks
Shanker V
Hi @shaheer
In other words to explain you, if you have unique records in File 1 ID and unique records in File 2 ID.
Then only we can expect 4000records in J anchor.
Note: Unique refers here as the same value repeated twice in ID, not both files should have same record count.
Please find the attached examples with small dataset.
Note: The workflow used to achieve the solution is attached which can be downloaded to see how the solution works.
If you believe your problem has been resolved. Please mark helpful answers as a solution so that future users with the same problem can find them more easily!!!!
Many thanks
Shanker V