2022.1.1.30569 Patch Release Update

The 2022.1.1.30569 Patch/Minor release has been removed from the Download Portal due to a missing signature in some of the included files. This causes the files to not be recognized as valid files provided by Alteryx and might trigger warning messages by some 3rd party programs. If you installed the 2022.1.1.30569 release, we recommend that you reinstall the patch.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Predicting Daily through Time Series Tools showing dates not in dataset

TheOC
14 - Magnetar

hey!

Apologies if there is a glaring answer to this... I've been struggling to use the time series tools to predict on a 'daily' basis.

What I've found, is that within my model and forecast, there are dates that do not exist in the dataset, and it seems to start a month after the latest date in my dataset. 
For instance, I have the following data:

TheOC_0-1636418795558.png



Ranging from a dates in 2019 to today. All dates have a randomised value for 'value to predict.

 

"2021-11-09" is the last data point in my set, but if i check the ETS, or Forecast model i have configured:

TheOC_1-1636418860869.png

The prediction starts (and has a value for) 'Jan 9'.


This appears to be the same for ETS and Arima, so i suspect i am misunderstanding how the data needs to look, prior to inputting to the timeseries tools.

Any help would be massively appreciated - i've attached my example workflow, that should highlight exactly what i mean.

Cheers,
TheOC

5 REPLIES 5
JoeS
Alteryx
Alteryx

Hi @TheOC 

 

The Time Series models do not expect there to be any missing values/dates or duplicates.

 

You'll need to make sure they are all populated before the model. The generate rows could be a great way to do this. You may well then want to use the impute tool to fill in the missing values with a mean (this should then have the least impact on the results)

 

Out of interest is there a reason some dates are missing?

TheOC
14 - Magnetar

Hi @JoeS 

Thanks for your response.
My apologies - I've maybe been confusing in how i have explained this. The dates are generated by a generated row tool - and theres 2 years of clean data.

My input data is perfectly fine.

My issue is that the output prediction starts 'January', but my data stops in November, so I would expect the output to start at the end of my data.

For instance, this is the last row of my data (sorted ascendingly)

TheOC_0-1636451938251.png

 

 

But within my timeseries, the 'prediction' starts at the 9th of Jan:

TheOC_1-1636451966354.png

 

 

Cheers,
TheOC

JoeS
Alteryx
Alteryx

Hi @TheOC 

 

Ah sorry, I should have opened your workflow and I'd have seen that.

 

OK, so the thing missing in your workflow is telling the two models when your data starts. 

 

The models, don't actually look through the rows to identify the dates (notice you don't tell it which column contains the date/time information).

 

There is an option (hidden away slightly) where you do that, but this is only valid for Weekly, Monthly, Quarterly and Annual frequencies.

 

So the dates that actually come out of the model are just period and sub_period and in no relation to the dates you put in other than they follow straight after.

 

This is definitely a little confusing when it comes to the visualizations though, as you have spotted, as they do plot a date.

 

What I'd recommend doing is adding into the date to the output data and then filling it in from there.

 

Not the only way to do it, but this is how I quickly built it: took the max date > added a record ID to the time series output > Append the max date > do a datetimeadd of the max date and record ID

Screenshot 2021-11-09 101654.png

 

TheOC
14 - Magnetar

 hi @JoeS , thank you so much for your help 😁.

This was a really frustrating issue to try to troubleshoot - however in hindsight that does make sense - that it doesn't know when the data starts. For some reason i had thought that it uses the other columns to work out when the data starts - e.g. the example workflow for Arima/ETS that starts perfectly when the data does. I've just had a double check and they use the setting 'Series starting period' that you mentioned.

Have a good day buddy, and thanks once again!

 

TheOC

JoeS
Alteryx
Alteryx

Yeah I remember having the same ah-ha a while back when I was wondering how it chose it's date (or didn't!)

 

Always happy to help, thanks, and have a great day as well!

Labels