Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

Advanced Join: value from one file is between, > or < a value from another file

WayneWooldridge
Alteryx Alumni (Retired)
Created

It's not uncommon to have a situation where you need to conditionally join one dataset to another.  Perhaps the most common is when you want to join one file to another where a date from the first file is between, greater than or less than a date(s) on a second file.  The default tools found in the Join section of the tool palette don't provide a simple way of doing this (it can be done, but you need to string several tools together to make it work.  There is a better way!  Read on...). 

 

There is great macro available in the public Alteryx gallery called Advanced Join (find it here, but spoiler alert... you can download the attached workflow which includes this macro so you don't have to go to the gallery to get it).  The Advanced Join gives you greater latitude than the Join tool.  Most notably, you can select records from file A that are unique to A AND intersect with file B.  Now you may be thinking, “I can do that by unioning the records from an inner join with records from a left join,” and you would be correct.  But it takes two tools to do what one Advance Join does.  More importantly, the Advanced Join allows you to put a conditional statement on your join which is something you can't do with the Join tool.  And it’s this feature - the ability to use conditional statements in a join - which we will focus on for our purpose here.

 

Let's get into some examples.  I have a file, 'Fruit List’, which contains data about various fruits.  This file contains a Column Id, a Fruit Name, a Start DateTime and an End DateTime:

 

AdvancedJoin 1.png

 

I have a second file, 'Greek Alphabet’, which contains a Column Id, a Greek letter and a Datetime.

 

AdvancedJoin 2.png 

 

I want to join the two files on ColumnId where the Datetime from Greek Alphabet (file B) is BETWEEN Start Datetime and End Datetime from Fruit List (file A).  Here's the workflow and a screenshot of how to configure the Advanced Join:

 

 AdvancedJoin 3.png

AdvancedJoin 4.png

 

 

And here are what my results look like:

 

AdvancedJoin 5.png 

 

Only one record from Greek Alphabet matched one from Fruit List on ColumnId where Greek Alphabet's Datetime was between Fruit List's Start Datetime and End Datetime.

 

In the next example, I have the same Fruit List file and want to join it another file, Greek Alphabet that contains just one datetime filed:

 

 AdvancedJoin 6.png

 

The first thing to note is both files have a field called 'DateTime.'  We'll want to give these unique names to avoid ambiguity when we write our conditional state in the Advance Join configuration. 

 

AdvancedJoin 15.png

 AdvancedJoin 7.png

 

 

AdvancedJoin 16.png

AdvancedJoin 8.png

 

 

I want to join both files on ColumnId but only when DateTime from Fruit List is LESS THAN DateTime from Greek Alphabet:

 

AdvancedJoin 9.png

AdvancedJoin 10.png 

 

 

And the results...:

 

 AdvancedJoin 11.png

 

Let's look at one last example.  This time, I'm going to use the Fruit List and Greek Alphabet files used in the first example (Fruit List has a Start DateTime and an End DateTime).  I'm interested in matching records where DateTime from Greek Alphabet is BETWEEN Start Datetime and End DateTime from Fruit List.  I'm not matching on ColumnId this time.

 

AdvancedJoin 12.png

 

For the Advanced Join configuration, I'm going to cross join my files.  (CAUTION: the resulting join could contain as many rows as the product of the number of rows of the incoming datasets - a.k.a. Cartesian join - depending on how restrictive your conditional is.  This means if you're joining 2 datasets that contain a million records each, the resulting dataset could contain as many as one trillion records! ).  If I had wanted to match on ColumnId, I would have had to do that separately using a Join tool. The cross join option only allows you to apply a conditional statement:

 

AdvancedJoin 13.png 

 

Results from our 3rd example:

 

AdvancedJoin 14.png

 

Notice how 10 records from Greek Alphabet were joined to just one record from Fruit List. 

 

 The Advanced Join tool can save you time and a lot of headaches when you want to join files using a conditional statement.  It has some limitations - you can only join two datasets and include one conditional statement per tool, cross join limitation mentioned above - but Advanced Join provides greater capability and flexibility than the standard Join tool.

Attachments
Comments
JohnJPS
15 - Aurora

 I ran into a situation recently where I had two tables with a conditional join that involved a column from TableA being bound by two columns from TableB and another column from TableA satisfying an inequality from TableB.  Also, TableA had over 100 columns, so any approach that involved joining back to TableA on all TableA columns was rather tedious.

 

A very easy solution - as long as TableA isn't too huge - is to use sqldf in the R tool.  This should also be considered for most any complex join condition.  My table as noted had over 100 columns and also over a million rows.  The sqldf approach wasn't fast - about 3 minutes to complete, and also gave one of those weird errors that isn't an error (see here)... but was very easy to implement and solved my problem.

Jake_Jake_Jake
7 - Meteor

The first example makes an assumption that does not translate well for broader problems.  It assumes that each Greek letter can only occur once.  This may be true in that context, but suppose the below tables are what you are working with and you need to calculate total sales for each team by month.  In the above example it will error out with this message "Tool #23: ....records were generated by an N-Dimensional Join"

 

 

Sale_IDSales PersonSale DateSale Amt
1Natasha1/1/201896
2Steve1/1/201836
3Tony1/1/201875
4Natasha1/2/201843
5Steve1/2/201838
6Tony1/2/201859
7Natasha1/3/201824
8Steve1/3/201879
9Tony1/3/201866
10Natasha1/4/201860
11Steve1/4/201839

 

...........

 

And 

 

Team MemberTeamStartEnd
NatashaAAA1/1/20183/1/2018
NatashaBBB3/2/20184/17/2018
NatashaCCC4/18/20188/8/2018
NatashaDDD8/9/201812/31/2099
SteveDDD1/1/20185/1/2018
SteveAAA5/2/20187/14/2018
SteveCCC7/15/201812/31/2099
TonyCCC1/1/20181/28/2018
TonyBBB1/29/20183/2/2018
TonyAAA3/3/201811/25/2018
TonyDDD11/26/201812/31/2099

 

 

 

The way around this issue is to use the Advanced Join tool to do a cross join with this condition:

 

Team_Member = Sales_Person AND Start <= Sales_Date AND Sales_Date <= End

 

This is the result:

 

Team_MemberTeamStartEndSale_IDSales_PersonSales_DateSale Amt
NatashaAAA1/1/20183/1/20181Natasha1/1/201896
SteveDDD1/1/20185/1/20182Steve1/1/201836
TonyCCC1/1/20181/28/20183Tony1/1/201875
NatashaAAA1/1/20183/1/20184Natasha1/2/201843
SteveDDD1/1/20185/1/20185Steve1/2/201838
TonyCCC1/1/20181/28/20186Tony1/2/201859
NatashaAAA1/1/20183/1/20187Natasha1/3/201824
SteveDDD1/1/20185/1/20188Steve1/3/201879
TonyCCC1/1/20181/28/20189Tony1/3/201866

 

....

 

 

You then can remove unnecessary columns and do a simple aggregation to figure out how much each team did in sales by month.

 

 

Of course if both of these tables are in the same database it is much easier just to write this logic in join statement instead of all these weird work arounds :)

 

bvinaya
5 - Atom

Hi All,

 

I have trying to do an advanced Conventional Join with a condition that visit date is between Start date and End date if the Join field matches.The data types of the join field is int16. But, i keep receiving an error " records were generated by an N- Dimensional Join - Join Key:(_A_:3)"

 

 

Please let me know if i am missing on anything. Screenshots attached below.

 

Thank you in advance!

 

ADVANCED JOIN ERROR.PNGaDVANCE JOIN ERROR MESSAGE.PNG

NeilR
Alteryx Alumni (Retired)

@bvinaya this has come up before here. Here was my response...

For a conventional join the join tool within the macro (tool #23) is set to error on multidimensional joins of more than 16 records to keep the output from blowing up. So you'll either need to address this within your incoming data (for example by removing records that have duplicates of the join key on one side of the join) or you can change the configuration of the join multiple tool #23 within the macro to "Allow all multidimensional joins".

bvinaya
5 - Atom

Hi NeilR,

 

Thank You for getting back with me on this!

It's worked. :)

 

 

Thank You!

Martyn
9 - Comet

Where ranges are involved, although it may seem a little fiddly to do, you could generate spatial objects (trade areas and points) to represent date ranges and dates (you can decide on the scale to use). You can then use the Spatial Match tool to effectively join on the date and date range (date is target, date range is universe, and you select target within universe).

 

This can also be applied to other ranges and points (not just dates).

jstney
5 - Atom

Hi NeilR, 

 

How do I change the configuration of the join multiple tool #23 within the macro to "Allow all multidimensional joins"?

NeilR
Alteryx Alumni (Retired)

@jstney :

 

  1. Right-click the Advanced Join tool and select Open Macro
  2. Click on the Join Multiple tool (the top purple one)
  3. Click the drop down to change from "Error on multidimensional joins of more than 16 Records" to "Allow All multidimensional Joins"
  4. Save