Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

How to use the ARIMA tool

EricWe
Alteryx
Alteryx
Created
How to use the ARIMA Tool 

ARIMA stands for Autoregressive Integrated Moving Average. An ARIMA model produces time series forecasts using autoregressive integrated moving averages based on a single variable model or covariate model. Generally, either the AR or MA terms are used, models with both terms are less common.

Procedure 

Start with a Time Series Decomposition Plot from the TS Plot Tool 

Use the TS Plot Tool first to investigate the data and determine what options are best in the ARIMA Tool with the data. Start with the decomposition plot. The TS Plot Tool provides the following plots: Time Series, Season, Decomposition, Autocorrelation, and Partial Autocorrelation. For more information, please see https://help.alteryx.com/current/designer/ts-plot-tool.

Determine the ARIMA terms 

Non-seasonal ARIMA models are classified as ARIMA(pdq)
p - number of lag periods 
d - number of differencing transformations to the data
q - error component not explained by trend or seasonality

Seasonal ARIMA models are classified as ARIMA(p,d,q)(P,D,Q)m model. 
P = number of seasonal autoregressive terms
D = number of seasonal differences (example: same period one year prior)
Q = number of seasonal moving averages 
m = number of periods in each season (often 12 for a year)

Build and validate the ARIMA model

Commonly, 10 - 30% of the data is used for a holdout validation sample in predictive modeling. This sample is usually the most recent data. It should include at least the number of periods you are forecasting. Validate with the TS Compare Tool using the ARIMA tool’s Object output and the remaining data.

When the Model customization tab is not used, the ARIMA tool automatically chooses the best model terms based on the AIC score. When comparing a custom model to auto selections, if the AIC scores are similar, and compare the calculated errors to see what is best. 

The Method section shows either the auto-selected terms or custom terms used in the model. 

idea Skyscrapers

Coefficients = regression coefficients
Sigma^2 = The MLE (maximum likelihood estimation) of the innovations variance
Log likelihood = approximation used (the maximized log likelihood of the differenced data)

idea Skyscrapers

Information Criteria
Lowest values show the best fit. 
AIC: Akaike Info. Criterion This measure shows the comparative quality of a statistical model. It balances the goodness of fit with the complexity of the model. AIC is used for comparison of models produced from the same data. AIC cannot show that all models are too inaccurate. It only provides a comparison of the accuracy between the models. 
AICc: Akaike Info. Criterion Corrected
BIC: Bayesian Info. Criterion

idea Skyscrapers

In-Sample Error Measures: 
Lower errors show a better model.
ME: Mean Error is the average difference of actual and forecasted values.
RMSE: Root Mean Square Error is the standard deviation for the differences between forecasted values and actual values.
MAE: Mean Absolute Error is the average sum of the difference from actual to forecasted values. 
MPE: Mean Percent Error is the average percent difference between actual and forecasted values.
MAPE: Mean Absolute Percent Error is expressed in a percentage that is useful for reporting.
MASE: Mean Absolute Scaled Error is the mean absolute error of the model divided by the the mean absolute value of the first difference of the series.

Scale-dependent errors are only for use with a single time series scale, and cannot be used with other comparisons on a different scale. This includes the measures ME, MPE, MAE, and RMSE. 

Percentage Errors are scale-independent and can be used for comparing forecast between different time series data sets, for example MAPE. Scale-free errors are also scale-independent such as MASE.

idea Skyscrapers

Ljung-Box test of the model residuals: 
Shows if there is independence of variables (no autocorrelation).
A P-value greater than the significance level of 0.05 indicates residuals are independent.

idea Skyscrapers

ACF and PACF plots help determine if data is stationary. These plots show if the mean and variance are constant over time,

ACF: Autocorrelation Function Plots show the correlation between an observation and its past values. The correlation coefficient is the vertical axis, and the lag number is on the horizontal axis.

The data is stationary if the significance after lag 1 is much less, and the mean and variance are unchanging. Stationary data helps predict that the mean and variance will be the same in the future as in the past. 

The data is not stationary if there is a slow decay toward zero correlation so that current values are more correlated to recent values. In this case, differencing is needed. 

Differencing is a method taking each value and subtracting it by the value in the previous period until mean and variance are constant. For seasonal differencing, try a Multi-Row Formula Tool, for example with the expression [Sales] - [Row-12:Sales] for a yearly season of 12 months.

If correlations exceed the dotted line, this indicates significance bounds. Recurring patterns indicate seasonality. 

idea Skyscrapers

PACF: Partial Autocorrelation plots show the correlation between the current period and the lag period while controlling the values of all the previous lag periods. 

If the PACF drops off after lag 1, and the stationary series has a positive correlation at lag 1, AR terms are the best. When the PACF drops more gradually after lag 1, and the stationary series has a negative correlation at lag 1, MA terms are the best.

idea Skyscrapers

Which model is the best, ARIMA or ETS? Try both and use the TS Compare tool. It selects the model with the best AIC or AICc score. 

For information on how to use the ETS tool, please see: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/How-to-use-the-ETS-tool/ta-p/549683.

Forecast 

Based on the validation testing, use the best model in the TS Forcast tool. Before using this tool, add the validation sample back to the data set. Note: you can highlight an area of a forecasted plot in the TS Forecast tool to zoom into that area. 

A residual is a difference between the observed value and the forecasted value. Good forecasting shows uncorrelated residuals. Residuals should be close to a 0 mean. Otherwise, the forecasts will be inaccurate. If needed, add the mean to all of the forecasts to correct this issue. 

After installing the Predictive Tools, Sample Time Series workflows are available in Help, Sample Workflows, Predictive tool samples menu. There are help pages for the Time Series tools, as well as recorded training sessions. Also, there is a free Time Series Forecasting course on the alteryx.com Resources page. Please see the links below.  

Credit and thanks goes to Bhumika Patel for the idea to write the article and much of the content. 

Additional Resources

https://help.alteryx.com/current/designer/time-series
https://community.alteryx.com/t5/Videos/Time-Series-Analysis/td-p/114070
https://community.alteryx.com/t5/Videos/Time-Series-Modeling/td-p/256421
https://www.alteryx.com/resources/resource-library/predictive-training 
No ratings