Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

Why Your Join Is Getting More Records than Expected

Ozzie
Alteryx
Alteryx
Created

While the Join tool is easily one of the most used tools in Alteryx, it can also be one of the most misunderstood. This is even more likely true if a new user hasn’t previously used joins in any other data-manipulating platform or they are joining big tables where they might not be keeping track of the records inside the fields they are joining on.

Thus I will go over three different cases of joining two tables on one common field that you may encounter and what to expect in your results.

The workflow I will go over (attached in this article) is a workflow I made in 10.5 and contains the same visuals that are below.

Case #1: Joining by fields that both have unique records

J1.jpg

This is the simplest case and the easiest to understand. In this case we are joining by the fields named “Fruit” in both inputs. The join will align the records of both tables if the records in the “Fruit” fields match.

So the expected joined table (J Output) should look like this:

joinedtable1.jpg

Fairly straightforward

Case #2: Joining by fields where only one has unique records

J2.jpg

For this case we have two records for “Banana” in our Left input. This means the record with “Banana” in the Right input will match up twice and get appended to both records from the Left input. Our Output should look like this:

joinedtable2.jpg

Case #3: Joining by fields that both have non-unique records

This one can be the most challenging to understand but hopefully after looking at this visually and mathematically you will understand why this makes sense and what you can do to avoid unwanted records.

J3.jpg

In this case we have two records in “Fruit” that both have “Banana” in both tables. The same thing happens here just like the case above except now it happens twice because we have two records from the Right input that match up with two records from the Left input. So our output will have a total of 4 records that have the word “Banana” in it. The output should look like this:

joinedtable3.jpg

To look at this mathematically for both Case #3 and Case #2, you will know how many records you will get for each record instance by multiplying the amount of records which appear in both tables. In the example above since we are joining on “Fruit” and we see the record banana twice in both tables you can think of it as 2 bananas x 2 bananas=4 bananas.

If we added another row on our Right input like so:

leftinputtriple.jpg

We now have 3 x 2 so we should expect 6 combinations of records with “Banana” in the “Fruit” field

joinedtable4.jpg

bananas.jpg

If you have joined two tables and you are getting more records then you expected then this is the most likely culprit of your problem. If one doesn't understand the cases presented abovethis could certainly drive anyone bananas (pun intended). For best practices, I would follow these two steps before joining any tables:

1st: Check your tables for any duplicate records.

For the case above since we have three duplicates from our Right input we can just use the unique tool and get rid of these duplicates.

unique tool.jpg

The output will be the same exact output as the output from Case #2 since there will only be one record of “Banana” after the unique tool.

2nd: See if joining on multiple fields is the more appropriate option.

Example: We have Table 1 with two records containing ‘Banana’ for “Fruit” and two records containing ‘X’ for “Store ID” with “FruitID” corresponding with “Fruit”. Table 2 has both the same “Fruit” and “Store ID” fields except now we have a “Store employees” field which pairs up with the “Store ID”.

Left input.jpg store employees.jpg

Table 1 Table 2

We want all of this data together so that it all pairs up nicely. Ideally we should only have four records come out of our join as we want to get the ”Store Employees” and “FruitID” fields in one table. However, if we were to only join on one common field whether it’s “Store ID” or “Fruit”, we would get combinations and the data wouldn’t be aligned properly. See below.

Joining on “Fruit”

mismatchfruit.jpg

In this case we get two records where our Store IDs don’t match up.

Joining on “StoreID”

mismatch2.jpg

In this case we get two records where our Fruits won’t match up.

To correct this, we will configure our join to match on both Fruit fields and Store ID fields.

joinconfig.jpg

After running this, the joined table should look like this:

goodmatch.jpg

When joining with multiple fields we make sure that records from both of these fields match before joining.

An easy way to think about this is imagine that ”Fruit” and “Store ID” are being merged together. Now all of a sudden we have unique records because we would now have Banana X and Banana Y as two unique entities and our join will now become a 1 record x 1 record join.

joinmatchmulti.jpg

Attachments
Comments
Rohit_Bajaj
9 - Comet
Hi, Thanks for this post. It helps to understand the Join Tool better. If you can also help provide the solution to one more variation to Case #3: Joining by fields that both have non-unique records. I have got two tables containing Fruit related information as below - Table 1 Table 2 Fruit Unique_Id Fruit Unique_Id Apple X Apple A Banana K Banana B Banana K Banana D Cantaloupe Z Cantaloupe C On joining both the tables on Fruit Column the output would look like - Fruit Unique_Id Right_Fruit Right_Unique_Id Apple X Apple A Banana K Banana B Banana K Banana D Banana Y Banana B Banana Y Banana D Cantaloupe Z Cantaloupe C Talking of Unique Ids, the requirement is to get either K,B and Y,D together or K,D and Y,B together. i.e. since the source data itself is duplicated, it is OK in output to have duplicates in output. There is no way to know actually whether K is linked to D or B and same goes with Y. Unfortunately there are no other join columns apart from Fruit, any other combination for K,Y,B and D would be deemed incorrect. For e.g. if the Output is K,B and K,D it would imply one record from Table 1 got joined to two records from Table 2. Again K,B and Y,B would imply same as above with the Tables being swapped. Idea is to get 1-1 mapping between the two tables, exact linking of records is not a concern. Thanks, Rohit Bajaj
amieding
5 - Atom

Hi OP, 

 

Thanks for walking through the join tool in detail! I'm running into a weird problem still, basically I have 2 files that has the same number of records, they have the same identifiers, 1 file just has a couple more columns I want to bring in and add to the other file. I used the join tool with multiple fields of identifiers to be as specific as possible, but the total resulted in the join (no left or right exceptions) were more than double the amount of records in the 2 files. How is this possible? Can someone give me some pointers?

 

Thanks in advance!

Ozzie
Alteryx
Alteryx

Hi @amieding, it is probably because the identifiers you are using are not unique. As explained in the post above, if you join on non-unique identifiers the join tool will multiply your records because of the number of possible joins between those non-unique records. I would add a unique tool before both left and right input connection to the join tool and do a unique on ALL the identifiers. If you see duplicates after the "D" output, this will confirm the likely cause. 

 

Hope this helps!

Ozzie

amieding
5 - Atom

Thank you Ozzie! I get what you're staying but the 2 files originally came from the same file and the only difference is the extra columns, that's why I'm confused as to why it created a significantly larger record amount. I will try using a unique tool and see if that helps. Thanks again!

FM-UoM
6 - Meteoroid

Hi, hoping you can help.  I'm using the join tool and getting exactly the problem you have described here.  I have unique records in my first file, and should have unique records in my second file, but that isn't always the case.  When there are duplicate records the additional field has multiple values (so it would be like your example above where Banana is linked to both store X and store Y).  Where that happens I need to strip out one of the values so that the files both contain unique records, and ideally I would use the Unique tool, but the value which needs to be stripped out depends on information which isn't contained in the file, so I can't see that the Unique tool would work for me.  Is there any way I can get the workflow to show me somehow when the number of records from my first output is different from the number of records in the post join output?

 

I have tried putting a count on both outputs, putting those through a join tool and attaching Expect Zero Records tools to both the left and right outputs from that, but it feels very clumsy, and I'm sure there must be a better way of picking it up.

 

Any ideas would be gratefully received.

JonaV
8 - Asteroid

Joining on 2 fields is like doing a Concatenate formula in Excel and then performing a Vlookup

Debjani
7 - Meteor

Hi,

This article is a good one and provides inputs to the working of Join tool.. I am a newcomer to the world of data and am learning a tool for the first time but these articles are making life bit easier for me. Thanks again..

 

 

George_Fischetti
8 - Asteroid

@OzzieThis is a great summary....

 

Just one question, I've been using other tools and traditionally the fields I've joined on need to be the same size (ex 10 character string).  Does Alteryx handle this differently?  Can I join a 10 Character string with a 12 character string or should I be performing more cleanup.  In this example I also know that there are no strings longer that 10 characters.

 

Thanks!  George

ck2019
9 - Comet

I have a strange situation where i am trying to join a true and false filter output against different criteria and the sum of my left, middle and right output is less than the total incoming values from the filter input - what could be causing that - any clues?

lepome
Alteryx Alumni (Retired)

@ck2019 
Sorry, I'm not great at visualizing what you're doing based on that description.  Would you be able to provide a screenshot or better yet a workflow with data that shows what you're seeing?