Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

How to Join on a Date Range

Ozzie
Alteryx
Alteryx
Created

A frequent use case we get asked about is how to join two data sets on a date range where one data set has a start date field and an end date field and the other dataset has a just a date field. While at first this may seem like a tough problem since the join tool only blends the data together if the records on the fields you’re joining on match. But really, all we need to do is find a way to generate the missing dates in the range so that we can get a match from both data sets. Luckily there is a tool for that called Generate Rows.

 

Below is an example I created in 10.6 that illustrates two ways to use generate rows to perform a join on a date range.

join1.jpg

 

In this workflow we are joining a set of customer transaction data and we want to join on a second dataset that tells us when and how long a particular sale went on.

datasetone.jpg                                                                            datasettwo.jpg        

First dataset                                                                                                                            Second dataset 

 

 

 

The first way we can solve this problem is use generate rows directly on our second dataset with this configuration:

generate rows.jpg

 

To make sense of this to those who are new to generate rows:

We are creating a new field in the second dataset called “Generate Date”.

Our initial value will start when the sale started so we will use “Sale start” as our Initialization Expression. We will create new rows for every record until “Generate Date” equals the end of the sale or “Sale End”, thus our condition expression is [Generated Date]<=[Sale End]. Since we want to make sure we don’t skip any days in between the start and end date we will make sure we only add 1 day to each new instance of “Generate Date” thus we use DateTimeAdd([Generated Date],1,”days”) for our Loop Expression.

To show you what this looks like, for our first record since the sale went from 02/01/2015-02/07/2015, we now have 7 duplicate records where the only thing that changes between them is the generated date for which we now have every date in the range in its own separate record. If you’re still having trouble understanding how this was done there is an excellent example that is almost the exact same in the sample workflow for Generate Rows. To access that simple right click on Generate Rows in the tool palette and click “Open Example”.

 

 

generateddate.jpg

 

 

Now that we have every date between the ranges, we can do a join with “Generated Date” and the “Transaction Date”.

And voila, we now have successfully joined the two datasets.

 

jpined.jpg

 

A potential issue of this is if you are working on a very large dataset with a lot of duplicate ranges this can put a huge damper on your performance and may cause errors due to limitations on your machine. In the example above 5 records became 37 after generate rows happened. Obviously depending on your ranges this can really cause your dataset to increase exponentially. If you’re working with an abnormally large dataset I would suggest doing the method below as it won’t increase the size of any of your data sets.

 

This method involves just two more steps.

 

join2.jpg

 

After the dataset with the ranges we perform a summarize and groupby by both “Sale start” and “Sale end”

 

 

groupby.jpg

 

This will give you every unique combination of ranges in case you have multiple records with the same records. We then use generate rows with the same exact configuration as above and we should now have our “Generated Date field”. Now that we have this field, we can join it back to our customer dataset with the transaction dates.

 

From here though we have to perform one more join since we are missing the other information that was dropped out from summarizing the dates. This time we join on the Sale start and Sale End fields as we want to exact matches to those records.

 

Attachments
Comments
patrick_digan
16 - Nebula
16 - Nebula

I'll add that the advanced join can sometimes be used in cases like this as well (when your data is small). @WayneWooldridge had a nice article on it.

 

Capture.PNG

Saarek
7 - Meteor

Whenever I try this, copying it to the abolute letter, I get the following error when entering in the loop expression: Parse Error at Char(30): Malformed function call.

***Managed to make it work in the end***

faith_pn
6 - Meteoroid

One suggestion, don't copy/paste the text from this post into your Generate Rows tool. You'll get a Parse Error. Instead, type it into your Generate Rows tool manually.