Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
ChrisF
Alteryx Alumni (Retired)


dateaggregator.png

Time series data is used in many of the industries we work with here at Alteryx. In some cases, we start with data that’s been measured by day and find that it’s a little too granular. The obvious solution here is to start rolling up the daily data into larger groups and experimenting with different aggregation levels, but this can quickly become difficult and/or tedious depending on the kind of aggregation you’re trying to do.

 

With that in mind, I decided to make a tool that would handle a lot of the work for you. The Date Aggregator allows you to take per-day time series data and roll it up into all sorts of different intervals. You can aggregate the data using one of 4 methods: Sum, Mean, Median, or Mode. Once you’ve chosen an aggregation method, the next step is to select an aggregation period. The Date Aggregator allows you to roll your data up according to several different intervals, all of which fall under two categories: Calendar Intervals and Static Intervals. In this case, a static interval is simply a number of days. For example: “I want to take this data and calculate the average for every 4 days.” This part is fairly easy, and as you’ve probably guessed, can be done with a Multi-Row Formula and a Summarize tool.

 

Calendar intervals are more difficult. The basic idea is that instead of saying “I want to see what every 4 day period looks like”, you’re saying “I want to see what every week looks like, and also I’d like my week to start on Tuesday because that’s when my sales cycle starts.” Furthermore, you could be interested in looking at data by month, or quarter, or several other time periods. The key distinguishing factor of calendar intervals is that we actually care about things like the number of days in each month, whether or not it’s a leap year, and what my company actually defines as the beginning of the week.

 

To handle the calendar intervals, we make heavy use of Alteryx’s date parsing functions by taking the incoming date field and breaking it up into all the pieces we’re interested in (e.g. what’s the day of the week, what month does the day fall in, what year are we talking about, etc.). From there, we can use a series of formulas to create a grouping field based on the parameters the user selected. For instance, if we want to see the data per quarter, the macro will pull out the month and year from each row, then bin it by quarter based on the month (e.g. IF MONTH <= 3 THEN ‘Q1’) and finally create a “quarter” field by concatenating the quarter and year for each row. By keeping the year around, we’re able to ensure that the macro doesn’t go overboard with the aggregation process and group every Q1 row together.

 

You can also specify an additional Group By field in case your data is measured across several different units (like store number, for example). If a Group By field is specified, the macro will first condense your data down so that each date/group by value pair has one unique record. After that, the macro proceeds to aggregate the data according to the macro’s configuration, running in batches for each element in the group by field so that you end up getting, for example, a per-week average for each store ID in your data. All told, the end result is that the Date Aggregator makes the question “What do my average sales look like on a per-week basis for each of my stores, with the week starting on Tuesday?” much easier to answer.

 

If you’d like to try the macro out, or just crack it open and see how it works, feel free to head over to the Macro District and download it here: Date Aggregator Macro

 

Thanks for reading!

 

-Chris

Comments