Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Difference between Union and Join

architgharat
5 - Atom

Hi,

 

I was trying to solve one question where I have 2 data sets with customers went to movie theater A & B and wanted to find out the combination of unique customers who went to only one movie theater, not both. Also, the average age of the guests.

 

I used both union and join tool to find out the the unique combinations of customers but both the tools gave me different results.

 

Can anyone please advice which tool be using and when?

 

Thanks,

Archit

3 REPLIES 3
AbhilashR
15 - Aurora
15 - Aurora

Hi Archit, the Join tool is more applicable in this case.

 

The J output of the Join tool gives you the common universe between the two datasets (which in your case is folks who went to both theaters), and L & R outputs give out those that are not mutually inclusive (i.e. people who went to one theater and not the other. Refer to the image below to visually understand the univ. of data coming out of the Join tool.

 

AbhilashR_0-1586063066837.png

 

I have put together a sample implementation of the Join tool on your dataset to help further.

 

Rakesh_Jasuja
8 - Asteroid

Hi @architgharat ,

 

1. join tool will join the data sets and will provide you 3 outputs. 
    a. L - where customers are only available in Left table
    b. J - where customers are in both tables
    c. R - where customers are only available in Right table. 

 

2. union tool  - it will simply join all the tables

 

based on your requirements you should be using both.

1. use Join to find out customers who went to only 1 cinema

2. use union to find out average if you need to evaluate average (you can use unique app to filter out client to unique)

 

Hope this information helps. 🙂

echuong1
Alteryx Alumni (Retired)

I think the easiest way to differentiate between the two is to think about what your end goal is - are you trying to expand your dataset horizontally or vertically?

 

When I say vertically, I mean similar to appending two datasets together. Let's say I have a sales transaction file for the North, South, East, and West regions and want to get a complete dataset. In excel, I would simply copy and paste all four together. This is a Union. It's simply stacking two or more datasets together.

 

Conversely, let's say I want to expand a dataset horizontally. I have a dataset and I want to add additional pieces of information to it. Traditionally, this would be accomplished using a vlookup in excel. This is a Join. It adds additional information based on common fields.

 

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Join-vs-Union/ta-p/92694

 

Hope this helps!

Labels