Join versus Union
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
How do I decide when to use the join tool versus the union tool?
- Labels:
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Mikebarb1
You might look at this knowledge article: https://knowledge.alteryx.com/index/s/article/Join-vs-Union-1583460916101
Essentially a join is similar to a vlookup in excel, adding or supplementing additional information (more columns) to your data. Union is combining datasets to form a larger dataset (from a record count perspective)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Mikebarb1
In simple word, Join arranges the two data stream horizontally, and Union arranges them vertically. 😁
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
As others have said, think of these as either data lookups (join) or a data stacker (union)
- Join - similar to a VLOOKUP in Excel, but way more powerful
- you need some sort of a key, which will be the same in both tables
- For instance, you'd have a customer ID in left table where you have order info - like items in a shopping cart
- Then you have a right table, which also has customer ID, but also has customer address
- You used Join tool to use the Customer ID to pull the address columns from right table and add them to the left table
- So, your amount of records (ideally) remains the same, but you enrich your data with additional columns from another data source
- Join has 3 output anchors
- L - records which are found in left table, but not found a match in right table
- J - records were found in both right and left table and a match was made based on your key
- R - records were found in right table, but not the left table
- Having these 3 outputs is super helpful and allows you to do all sorts of things
- If you do some SQL, you can think of L as left outer join, J as an inner join and R anchor as right outer join - behaves more or less the same in this Join tool
- Union - Excel doesn't have a built in tool to do this, but think of it as a data stacker
- Example would be having data with 12 tabs in Excel, each tab is for one month and all tabs have the same columns
- If you want to merge all this data in Excel into one huge table for the whole year you'd need to create a 13th tab
- Then copy and paste the data under each other on teh 13th tab to make up the whole year
- So, if each month had 100 records in each tab, your final stacked tab for the whole year would have 1200 records
- Lots of work and prone to human error
- Union does this for you!
- As long as your data sources have the same data types and column order, you can all point them into a Union tool and it wills tack the data for you into one giant table - either by position of the column or by column name
- Very nifty when you are trying to either
- merge many files of same structure
- do some extra calculations in Alteryx where you split the data and then you need to merge it back together
The dark green column is a primary key used to do the lookup in the Join tool.
Union tool doesn't need a primary key - it merges (stacks) the data by either column name or column position.
