on 08-08-2016 05:23 PM - edited on 04-21-2023 09:30 PM by yhchen
This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Join Tool on our way to mastering the Alteryx Designer:
The Join Tool is the quintessential tool for data blending within Alteryx. As such, it is also one of the most widely used tools. The Join Tool allows you to join data together from two different sources in two different ways: by record position and by specific fields. Selecting by record position will attach the two datasets together where it will match up each record by the position it is in. Thus record 1 of the left dataset will be in the same row as record 1 on the right in the J output and so on. If one dataset from either side has more records than the other those records will not be joined and they will be placed in there corresponding right or left output (L or R). Joining by specific field will match records up based on a specific field or multiple fields. This article goes into how that option works in more depth and detail. I highly recommend it as a read, as it covers frequent behaviors of the tool that you might run into.
Below are some examples of how you might use a join:
Table 1 Table 2
Joined result of Table 1 and Table 2
If you're noticing your join output has fewer records than anticipated, be sure to check out this article!
By now, you should have expert-level proficiency with the Join Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.
Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.
Additional Information
Click on the corresponding language link below to access this article in another language -
Portuguese
Spanish
French
German
Japanese
Hello, After using the join in alteryx I have drawn a hypothesis. So I would like to know, What I am thinking It is true or not. When joining from two datasource with more than one key, What I am realising It does not do the samething what SQL server do.
So for example I have 3 field A, B, C which is common in both two table. So after joining If B and C is NULL for some row then SQL server does not join the data. But alteryx join the data only based on A key, which is not null.
So the Hypothesis is when there is more than one key to join,
Alteryx joins data from two table when atleast one key is available, but SQL server join if and only if all key are not null in case of simple join or inner join.
Hey DataPsycho!
This is likely because the Join Tool recognizes NULL cells as identical values - that is, it finds a match in the A key, and then joins every NULL of B and C to every NULL of B and C in your other dataset. You could confirm this by checking to see if there are duplicates exiting your join. The behavior is described here in the context of one join condition field: https://community.alteryx.com/t5/Alteryx-Knowledge-Base/How-the-Join-Tool-Works-and-Why-You-Might-Be...
Happy Blending!
MattD
Hello,
Thanks for this article.
Is there a limit to the number of rows when performing a join?
I am very impressed with the speed at which Alteryx Designer is able to load and join two datasets using key fields (see example below).
Note: Academic exercise only using government/NHS data available on the public internet.
I am impressed that it took only 2 minutes to join. I tried this on a different analytics tool and it took 20 minutes!
Part of the challenge in data blending is to locate the key fields in desperate sources. However the GUI here helps to make the data exploration easier.
I am also enjoying the on demand videos available at: https://www.alteryx.com/on-demand-training
Keep up the good work!
Hi @plim,
There is no limit in the number of rows. The only limit that exists is the hardware on your machine and its ability to handle large data. Glad you are impressed with the tool. This is what we strive to hear from our users.
Cheers!
I am currently encountering that hardware limit that Ozzie is mentioning; I have been running queries that take up to an hour to run; too large files, but I need a faster computer with more memory.
This information would be helpful in the tool mastery article:
The Select within the Join tool should only affect the J output.
the select/de-select and the re-name functionality only affects the J output anchor.
Records output through the L and R outputs are passed through unmodified, including fields that are deselected in the Join tool.
the L and R output anchors will always retain the original input field names and all fields will be “selected” for output
Interesting hypothesis drawn by my fellow learner @DataPsycho regarding the workings of how join works in alteryx compared to SQL. What @MattD said is true and I implemented it. Create an empty table with null values and try comparing them. When two fields have null values and when you check if they are equal to each other, you will get a true value.
Conclusion : Multiple field joins in alteryx is like an AND statement.
This is a great tool - reminds of my Joins in SQL, but a lot easier to execute.
It seems similar as the SQL "inner/left/right join?
I found a typo:
Change "there" to "their" corresponding Left or Right output (when records don't join).
In the example use join as a filter it states: ID 3 and 4 is not contained in Table 2
however, ID 4 is not present on the first dataset, I suggest to either add it to the table or remove the 4 from the text.
ok
thanks for this explanation
hhtps://community.altery.com/t5/alteryx-community-english/ct-p/externa
This is a great tool - reminds of my Joins in SQL, but a lot easier to execute.
7-Meteor
7 meteor