Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Timeseries analysis and data aggregation

jonathanogrady
8 - Asteroid

Hello Alteryx fans!

 

I'm getting to grips with Timeseries and I have a question regarding the frequency of my observations versus seasonality in my data.

 

If we are looking at online sales for example, there may be a seasonality according to time of day, day of week and time of year.

 

I have three years of hourly observations. The data is volatile, but there is a clear trend which shows a peak during the afternoon, and a general peak around Christmas and a dip in the summer.

 

At what level of aggregation should I use for my forecasting? Does it make a difference? How do I get the best results?

 

I attach a sample dataset to help answer the query.

 

I'd be very grateful for any wise input from the community!

 

Best wishes,

 

Jonathan

3 REPLIES 3
jonathanogrady
8 - Asteroid

I notice that nobody has had the opportunity to consider my questions above.

 

I hope they are not too vague/obscure?

 

I've been through the various documentation regarding timeseries, and the maths quickly becomes hieroglyphics.

 

In simple terms, I was hoping that somebody might be able to explain to me the relationship between aggregation and forecasting.

 

When I forecast forward using hourly data, I get a good pattern for what happens over a day, but it seems to lose all fidelity with respect to longer time periods.

 

Conversely, if I aggregate up to weekly data, I get some seasonality, but I lose that day on day level of detail.

 

What is a good rule of thumb here? I notice that the forecast periods toggle only goes to 365. I have three years of hourly data here and I would like to project forward say six months. Seems reasonable?

 

Don't really know how to think about it and could do with some guidance if anyone feels so inclined…

 

Much appreciated!

 

Jonathan

DrDan
Alteryx Alumni (Retired)

Hi Jonathan,

 

This isn't really a technical issue, rather it relates to the decision the forecast is being used to address. If the issue being addressed is "How many web servers should we have spun-up?", then the time interval of the forecast should match the time it takes to spin-up or spin-down servers (my hunch this will correspond to an hourly model). On the other hand, if the question relates to "how large an inventory should we have on hand?" then an hourly time interval would be too granular, and you would want one on the scale of inventory replenishment (my hunch this would be weekly or monthly).

 

If the question being addressed is the former, then I would think about using an ARIMA model with covariates. The covariates to use would be oriented towards the macro level seasonal effects (e.g., Christmas rush and the summer dip, which likely could be addressed using month indicator variables) and the day of week effects (day of week indicator variables would handle this) while the ARIMA methods would handle the time of day effects.

 

Dan

jonathanogrady
8 - Asteroid

Doctor Dan!

 

Thank you very much for taking the time to look at my question.

 

I suppose I was trying to figure out how far forward it is reasonable to forecast, and the relationship between this and the level of aggregation.

 

I read you said before that as a rule of thumb you need 3-6X for a decent sample, in other words 3 to 6 years to forecast a year. I just wasn't sure how that also related to aggregation.

 

Anyway, thanks for giving it a whirl. Looks like it's time for me to buy a book!

 

Best wishes,

 

Jonathan

Labels