Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Tool Mastery

Explore a diverse compilation of articles that take an in-depth look at Designer tools.
Become a Tool Master

Learn how you can share your expertise with the Community

LEARN MORE

Tool Mastery | Join

Ozzie
Alteryx
Alteryx
Created

Join.pngThis article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Join Tool on our way to mastering the Alteryx Designer:

 

The Join Tool is the quintessential tool for data blending within Alteryx. As such, it is also one of the most widely used tools. The Join Tool allows you to join data together from two different sources in two different ways: by record position and by specific fields. Selecting by record position will attach the two datasets together where it will match up each record by the position it is in. Thus record 1 of the left dataset will be in the same row as record 1 on the right in the J output and so on. If one dataset from either side has more records than the other those records will not be joined and they will be placed in there corresponding right or left output (L or R). Joining by specific field will match records up based on a specific field or multiple fields. This article goes into how that option works in more depth and detail. I highly recommend it as a read, as it covers frequent behaviors of the tool that you might run into.

 

Below are some examples of how you might use a join:

 

  • Joining to bring a field not available in other data sources. As you can see in the two tables below, both have fields that are exclusive to those tables and we want to bring those fields together. We would do this by joining on "Fruit" as this is the common field/identifier between both datasets:

     

J1.jpg

 

 

  • Using Join as a filter. You can also use a join to filter out records if you have a secondary dataset or list you want to filter by. Seen below, since Customer ID 3 and 4 is not contained in Table 2, records containing 3 and 4 in Customer ID will get dropped from the center join:

 

customer ID table join.jpg customer ID table join 2.jpg

Table 1 Table 2

 

 

customer ID table join 3.jpg

Joined result of Table 1 and Table 2

 

 

If you're noticing your join output has fewer records than anticipated, be sure to check out this article!

 

By now, you should have expert-level proficiency with the Join Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.

 

Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.

Additional Information
Click on the corresponding language link below to access this article in another language -
Portuguese
Spanish
French
German
Japanese

Comments
DataPsycho
7 - Meteor

Hello, After using the join in alteryx I have drawn a hypothesis. So I would like to know, What I am thinking It is true or not. When joining from two datasource with more than one key, What I am realising It does not do the samething what SQL server do. 

 

So for example I have 3 field A, B, C which is common in both two table. So after joining If B and C is NULL for some row then SQL server does not join the data. But alteryx join the data only based on A key, which is not null.

 

So the Hypothesis is when there is more than one key to join,

Alteryx joins data from two table when atleast one key is available, but SQL server join if and only if all key are not null in case of simple join or inner join.

MattD
Alteryx Alumni (Retired)

Hey DataPsycho!

 

This is likely because the Join Tool recognizes NULL cells as identical values - that is, it finds a match in the A key, and then joins every NULL of B and C to every NULL of B and C in your other dataset. You could confirm this by checking to see if there are duplicates exiting your join. The behavior is described here in the context of one join condition field: https://community.alteryx.com/t5/Alteryx-Knowledge-Base/How-the-Join-Tool-Works-and-Why-You-Might-Be...

 

Happy Blending!

MattD

plim
6 - Meteoroid

Hello,

 

Thanks for this article.

 

Is there a limit to the number of rows when performing a join?

 

 

I am very impressed with the speed at which Alteryx Designer is able to load and join two datasets using key fields (see example below).

 

Note: Academic exercise only using government/NHS data available on the public internet.

 

 

BrowseProfile.png

 

 

JoinBrowse.png

 

 


I am impressed that it took only 2 minutes to join. I tried this on a different analytics tool and it took 20 minutes!

 

Part of the challenge in data blending is to locate the key fields in desperate sources. However the GUI here helps to make the data exploration easier.

I am also enjoying the on demand videos available at: https://www.alteryx.com/on-demand-training

Keep up the good work!

 

Ozzie
Alteryx
Alteryx

Hi @plim,

 

There is no limit in the number of rows. The only limit that exists is the hardware on your machine and its ability to handle large data. Glad you are impressed with the tool. This is what we strive to hear from our users.

 

Cheers!

JORGE4900
8 - Asteroid

I am currently encountering that hardware limit that Ozzie is mentioning; I have been running queries that take up to an hour to run; too large files, but I need a faster computer with more memory.

ChrisTX
16 - Nebula
16 - Nebula

This information would be helpful in the tool mastery article:

 

The Select within the Join tool should only affect the J output.
   the select/de-select and the re-name functionality only affects the J output anchor.

 

Records output through the L and R outputs are passed through unmodified, including fields that are deselected in the Join tool.
   the L and R output anchors will always retain the original input field names and all fields will be “selected” for output

hamza_khan
7 - Meteor

Interesting hypothesis drawn by my fellow learner @DataPsycho regarding the workings of how join works in alteryx compared to SQL. What @MattD said is true and I implemented it. Create an empty table with null values and try comparing them. When two fields have null values and when you check if they are equal to each other, you will get a true value. 

 

Conclusion : Multiple field joins in alteryx is like an AND statement. 

 

 

Dante_Healy
6 - Meteoroid

This is a great tool - reminds of my Joins in SQL, but a lot easier to execute. 

Hangxing_Zhou
7 - Meteor

It seems similar as the SQL "inner/left/right join?

Frachiu
7 - Meteor

I found a typo:

Change "there" to "their" corresponding Left or Right output (when records don't join).

Frachiu
7 - Meteor

In the example use join as a filter it states: ID 3 and 4 is not contained in Table 2

however, ID 4 is not present on the first dataset, I suggest to either add it to the table or remove the 4 from the text.

Tyrane782001
7 - Meteor

ok

Gayitri420
7 - Meteor

thanks for this explanation

srilakshmi123
7 - Meteor

hhtps://community.altery.com/t5/alteryx-community-english/ct-p/externa

Anitha1289
7 - Meteor

This is a great tool - reminds of my Joins in SQL, but a lot easier to execute. 

0
 
 
 
 
Shyam_amarapu_2002
7 - Meteor

7-Meteor

Mgaurav2grt
7 - Meteor

7 meteor