I have two stand alone data sets I am trying to join:
Data Set 1 is 1710 records long and my first column is Identifier and the second column is secondary identifier. If there is a secondary identifier, I prefer to use this. So I've made a formula to replace the Identifier with the secondary identifier if there is one. No problems here.
In data set 2, which is 2264 records long, I only have Identifier. I want to replace the identifier in data set 2 with the preferred secondary identifier in Data set 1 if there is one.
I am using a join tool, I selected my Identifier column in dataset 2 and Identifier column in dataset 1. I only have identifer column from dataset1 joining into my dataset 2. The output is 199,216 records. Why is data being duplicated? I would of expected the output J anchor to have 2264 or less, why is it showing 199,216?
Thanks in advance,
Your issue is that the field you are joining on is not unique within the data set. This will result in one record per matching combination which leads to very large numbers. For example image a join between two data sets on a field called Record ID. If data set 1 has 3 occurrences of Record ID 1 and data set 2 also has 3 occurrences of Record ID 1 you will have 9 records with Record ID 1 in the output. See the attached sample demonstrating this phenomenon.
1. First check if either dataset has duplicate item identifiers.
2. If there are nulls in either dataset that could also be the reason for the large number of rows.
3. If there are other columns you can join each dataset with I'd recommend doing so.
It's kind of difficult to figure out the exact reason for your issue. If you could provide the workflow or inputs that would help me find a solution.
User | Count |
---|---|
107 | |
82 | |
69 | |
54 | |
40 |