Hi Everyone, Can Someone guide me how to use Xgboost regression for time series data?
The general sentiment is that XGBoost isn't meant for time series applications. It can be done, but would require transformation/modification of the input data and procedures.
Is there a reason why you're looking to boosted models instead of the time series tools available in Alteryx?
Hi @CharlieS Reason behind using Boosting model is Time series tools in Alteryx is not customizable for the data i have.
Could you tell/show/share more about your current data? A sample would be best if you can share that (even with dummy data).
This sounds like an opportunity to transform the data for analysis. I think that's a better path to take rather than forcing the current data into alternative modeling methods.
Hi @CharlieS The sample data consists of 3 years(2018 to 2020) historical data of 6 customers with 12 months per year per customer. Need to forecast cost amount for 2021 for every customer for every month
You can download TS Model Factory tool to answer your question, it does a group by TS model.
I assume in practice you will use all three years of data to feed the model rather than using average, otherwise you won't have enough rows, so I created some dummy values to pad out the data you supplied.
Though I would be much more concerned about the 0 in the data, they would significantly impact your model. If there is only a few rows missing, you can estimate them, but there are too many dates missing, then you need find additional data or may need to drop some ID.
Hi @leozhang2work Thanks for the workflow. I've gone through it and found that the forecast is having a constant value for most of the IDs throughout the year. Why is it like that?
@Ivaturi_Vighnesh wrote:Hi @leozhang2work Thanks for the workflow. I've gone through it and found that the forecast is having a constant value for most of the IDs throughout the year. Why is it like that?
The default settings of the ARIMA tools is to use the most recent value as the forecast the future values (which is called the naïve method). Now that the tool is working, it is up to the user to configure the model for the particular scenario based on the available data and desired hypothesis testing. The model customization settings allow the user to adjust parameters like differencing, seasonality, drift, and many more.
Here's a link to the suggested handbook for time series modeling:
In the model, it is chosen ETS, which if there is not too much fluctuation in the data, It is likely to return the constant model, i.e. no trend, no seasonality.
You can build individual ETS model to see.
In this case, A, N, N stands for Additive for error (i.e. flat), None for trend, None for Seasonal, hence it produced constant as best model.
But i need different predictions for all the months. Constant value doesn't help my project. Is there any other way?