This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 05-26-201601:59 PM- edited
2 weeks ago
While the Join tool is easily one of the most used tools in Alteryx, it can also be one of the most misunderstood. This is even more likely true if a new user hasn’t previously used joins in any other data-manipulating platform or they are joining big tables where they might not be keeping track of the records inside the fields they are joining on.
Thus I will go over three different cases of joining two tables on one common field that you may encounter and what to expect in your results.
The workflow I will go over (attached in this article) is a workflow I made in 10.5 and contains the same visuals that are below.
Case #1: Joining by fields that both have unique records
This is the simplest case and the easiest to understand. In this case we are joining by the fields named “Fruit” in both inputs. The join will align the records of both tables if the records in the “Fruit” fields match.
So the expected joined table (J Output) should look like this:
Case #2: Joining by fields where only one has unique records
For this case we have two records for “Banana” in our Left input. This means the record with “Banana” in the Right input will match up twice and get appended to both records from the Left input. Our Output should look like this:
Case #3: Joining by fields that both have non-unique records
This one can be the most challenging to understand but hopefully after looking at this visually and mathematically you will understand why this makes sense and what you can do to avoid unwanted records.
In this case we have two records in “Fruit” that both have “Banana” in both tables. The same thing happens here just like the case above except now it happens twice because we have two records from the Right input that match up with two records from the Left input. So our output will have a total of 4 records that have the word “Banana” in it. The output should look like this:
To look at this mathematically for both Case #3 and Case #2, you will know how many records you will get for each record instance by multiplying the amount of records which appear in both tables. In the example above since we are joining on “Fruit” and we see the record banana twice in both tables you can think of it as 2 bananas x 2 bananas=4 bananas.
If we added another row on our Right input like so:
We now have 3 x 2 so we should expect 6 combinations of records with “Banana” in the “Fruit” field
If you have joined two tables and you are getting more records then you expected then this is the most likely culprit of your problem. If one doesn't understand the cases presented above this could certainly drive anyone bananas (pun intended). For best practices, I would follow these two steps before joining any tables:
1st: Check your tables for any duplicate records.
For the case above since we have three duplicates from our Right input we can just use the unique tool and get rid of these duplicates.
The output will be the same exact output as the output from Case #2 since there will only be one record of “Banana” after the unique tool.
2nd: See if joining on multiple fields is the more appropriate option.
Example: We have Table 1 with two records containing ‘Banana’ for “Fruit” and two records containing ‘X’ for “Store ID” with “FruitID” corresponding with “Fruit”. Table 2 has both the same “Fruit” and “Store ID” fields except now we have a “Store employees” field which pairs up with the “Store ID”.
Table 1 Table 2
We want all of this data together so that it all pairs up nicely. Ideally we should only have four records come out of our join as we want to get the ”Store Employees” and “FruitID” fields in one table. However, if we were to only join on one common field whether it’s “Store ID” or “Fruit”, we would get combinations and the data wouldn’t be aligned properly. See below.
Joining on “Fruit”
In this case we get two records where our Store IDs don’t match up.
Joining on “StoreID”
In this case we get two records where our Fruits won’t match up.
To correct this, we will configure our join to match on both Fruit fields and Store ID fields.
After running this, the joined table should look like this:
When joining with multiple fields we make sure that records from both of these fields match before joining.
An easy way to think about this is imagine that ”Fruit” and “Store ID” are being merged together. Now all of a sudden we have unique records because we would now have Banana X and Banana Y as two unique entities and our join will now become a 1 record x 1 record join.