community
cancel
Showing results for 
Search instead for 
Did you mean: 

Engine Works Blog

Under the hood of Alteryx: tips, tricks and how-tos.
Sr. Community Content Manager
Sr. Community Content Manager

Have you ever noticed that you can't do a left join or a right join in the traditional sense in a single step with Alteryx? Normally to accomplish this you'd add a union tool after the join like so:

 Left join

I thought it would be nice to reduce this to one step, as pointed out here. While I was at it, I thought I'd add some functionality to filter the data in the same step, reducing the need to add a filter tool after the join, as is often the case. Sandeep expressed an interest here in being able to do this with date ranges. 

 

Attached are some example uses of the macro I came up with to accomplish this. The first example shows a left join. The second example, "Inner join with condition (date range)", performs an inner join on a certain column, then filters the joined dataset by checking if a date from one incoming dataset falls between two dates from the other. In SQL you would accomplish the same thing like so:

 

SQL join

 

In SAS (assuming the datasets are sorted properly and the join by columns have the same name), you'd do this:

 

SAS join

 

Another thing you can do with this macro is a cross join (Cartesian product of rows from incoming data) with an optional filter condition. But be careful, before applying the filter, the resulting join will contain as many rows as the product of the number of rows of the incoming datasets. Meaning if you're joining 2 datasets that each contain a million records, the resulting dataset will contain one trillion records! A potential use for the cross join is illustrated in the third example.

 

Feel free to download the macro and tailor it to suit your needs.

 

-Neil

Neil Ryan
Sr Program Manager, Community Content

Neil Ryan is the Sr Program Manager, Community Content, responsible for the content in the Alteryx Community. He held previous roles at Alteryx including Advanced Analytics Product Manager and Content Engineer, and had prior gigs doing fraud detection analytics consulting and creating actuarial pricing models. Neil's industry experience and technical skills are wide ranging and well suited to drive compelling content tailored for Community members to rank up in their careers.

Neil Ryan is the Sr Program Manager, Community Content, responsible for the content in the Alteryx Community. He held previous roles at Alteryx including Advanced Analytics Product Manager and Content Engineer, and had prior gigs doing fraud detection analytics consulting and creating actuarial pricing models. Neil's industry experience and technical skills are wide ranging and well suited to drive compelling content tailored for Community members to rank up in their careers.

Comments

Neil

This is a Awsome Macro 🙂

thanks 

Alteryx Partner

What does this mean:

Error: Advanced Join (27): Tool #23: 63 records were generated by an N-Dimensional Join - Join Key: (__a__: 1009:2)

 

i get this error when doing a conventional join with a condition.

Sr. Community Content Manager
Sr. Community Content Manager

@jz For a conventional join the join tool within the macro (tool #23) is set to error on multidimensional joins of more than 16 records to keep the output from blowing up. So you'll either need to address this within your incoming data (for example by removing records that have duplicates of the join key on one side of the join) or you can change the configuration of the join multiple tool #23 within the macro to "Allow all multidimensional joins".

Alteryx Partner

Thnx Neal!

Atom

Thanks, I needed this one. May sound obvious, but with no dates I used condition 1=1. May not be needed, but I love a good combinatorial explosion.

Bolide

Very useful!  I've needed this for joining on date logic for a while now.

Asteroid

this is fantastic! 

 

What if i want to join on a date condition as shown here AND join by, say, a "group id"? 

 

Do the join tool on group id first and then the advanced join back to the prior input? 

Sr. Community Content Manager
Sr. Community Content Manager

The second example in the package attached to the post is doing just that, I believe...

Capture.PNG

Atom

Hi all - 

 

@NeilR, Thanks for this macro; it's very useful!

 

Is there a syntax with which we can set the condition to return the record with the maximum start_date, in the event that multiple start_date's are < date?

 

i.e. In the following table, if the date was 2015-02-15, I would want to return "mcintosh"; but if I had another record where date was 2015-01-28, I would want to return "red delcious"

 

Table 1: 

start_date                         extra_col1              extra_col2

2015-01-01                      apple                      red delicious

2015-02-01                      apple                      mcintosh

2015-03-01                      apple                      granny smith

Bolide

@cando235,

 

You could achieve this a few ways.  Here's one method I use on occasion:

 

  1. Perform the advanced join as you described
    1. extra_col1 = extra_col1
    2. (Condition) start_date  < date
    3. Use Left Join
  2. Attach a summary tool
    1. Group by extra_col1
    2. Group by date
    3. Select Max start_date as Max_start_date
  3. Using a traditional join tool, inner join the summary results to the results from the advanced join:
    1. assuming the advanced results are LEFT and the summary results are RIGHT
    2. (left) start_date = (right) Max_start_date
    3. (left) extra_col1 = (right) extra_col1
    4. (left) date = (right) extra_col1
  4. Select all of the left fields only

 

Link to Example Workflow (can't add attachments in blog comments for some reason)

Here's a preview of the tool layout:

 

advanced_join.png

Meteor

Does this macro works for  Connect In-DB?It seems only works with "Input Data" tools?

pls let me know.

 

Thanks,

Brij

Sr. Community Content Manager
Sr. Community Content Manager

@brij Correct, it only works for the standard in memory tools. For in database joins use the Join In-DB Tool.

Labels