This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I'm back with another question, and I was hoping a kind soul out there might share their wisdom?
I'm getting to grips with Arima forecasting using the timeseries tools.
My question concerns the relationship between the dates under review and the resulting forecast
I attach an example to illustrate my point. I have three years of daily data which I have consolidated into weekly numbers.
In the top workflow, I specify an end date of week 26 December 2016, and the resulting forecast has a very pleasing oscillation.
In the bottom workflow, I specify an end date of 19 December 2016, and the resulting forecast is a deadly flat line of the mean.
But here's the rub: if I specify an end date of week 26 December, I am only including six days in my aggregation. This is not ideal, so my preferred date is week 19 December, which includes a full seven day week.
Why does one workflow give me a very pretty oscillating forecast, yet the other is the awful flat line? Is there a simple explanation for this quirk?
Hey @jonathanogrady! It looks as though Alteryx fit two different models based on the information fed into the ARIMA tools. When December 26th was included, Alteryx fit an ARIMA(1,0,1)(1,0,0) model which has 3 coefficients and an intercept. This can be seen in the R output from the ARIMA tool. On the other hand, when the last date included was December 19th, Alteryx fit an ARIMA(1,0,1)(0,0,0) model which only has 2 coefficients and an intercept term. Because different models were generated, you get different forecasts. Hope this helps!
Hey @jonathanogrady. If you run the workflow that you attached to your original post, the R output from the ARIMA tool is already included. You can view the output in the middle Browse tools in your 'Selected Model: Arima' Tool Containers. I included screenshots below for your reference.
This is the top one with the 2016-12-26 date included. You can see in #2 here on the left that Alteryx chose an ARIMA(1,0,1)(1,0,0) model for the data that was fed into the tool.
Here is your second set of tools where 2016-12-26 was excluded. Again, because you told Alteryx to auto-fit a model, it chose what it thought fit the incoming data the best. Here, that is an ARIMA(1,0,1)(0,0,0) model. Because different data points were used to create this model, Alteryx fit a different one to it than it did to your first set of data. Unfortunately, this different model's predictive power does not seem to be as good as the first model's, as you pointed out.
This is where a little statistical background becomes handy, but if you play around with the parameters in the ARIMA tool manually, you can get rid of that flat-lining that is occurring when you try to forecast. Use your list of 155 dates (that does not include 2016-12-26) and use the second tab in the configuration of the ARIMA tool to specify your model manually.
It seems as though you need a seasonal component in the model. In the above picture, I just went with the ARIMA(1,0,1)(1,0,0) model and here is the output I got:
This improved the predictive capability even when excluding the one date. The sigma^2 also decreased with this new model. If you want to modify the number of parameters, feel free to mess around with that, but remember the idea of parsimony (the simpler, the better - when possible). Hopefully that all makes sense! Let me know if that's what you're looking for.
Thank you so much for going to such trouble to answer my question.
I didn't realise that the summary output method could be read as the models R configuration. Thank you for pointing that out.
I have used the model customisation before, but the words of Doctor Dan ringing in my ear. Alteryx research shows that most users can rarely improve on the out-of-the-box forecast unless they really know what they're doing!
I created 4 versions and got the following results:
Up to 19 December, out-of-the-box: Sigma^2 = 58.6 million Up to 19 December, customised: Sigma^2 = 50.1 million Up to 26 December, out-of-the-box: Sigma^2 = 50.4 million Up to 26 December, customised: Sigma^2 = 50.4 million
I was able to replicate your fitted result using the out-of-the-box tool, so long as I choose 26 December.
I suppose my question centres around why the Arima tool is creating such radically different models depending on six days difference on three years worth of data.
Perhaps this is one we will have to chalk down to "the algorithm"!
@jonathanogrady I'm glad you were able to make some progress on this! I think that because the last week of that final year was not included, the model was not able to pick up the seasonal component in the data set. (Or maybe it did, but the final model you got 'beat' it in terms of the best pick based on other parameters) Therefore, when fitting the best model, it only included one auto-regressive and one moving average term. When you forced the model to also have a seasonal component, it was able to fit your data and still have reasonable forecasts. Note that this is not something that is a result of the Alteryx Arima tool necessarily, but rather in time series model building itself. Make sure that if you do customize your model, you continue to check the assumptions of ARIMA models such as the ACF plots. Hope this was helpful!