Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Predicting the future dates based on the historic data

Pravallika20
8 - Asteroid

Hi all,

 

we are having a data for 3 years in a column. we want to know the next dates in the same column. can anyone please suggest me how to do this? Hereby is the sample data.

 

SUBMIT_DATELAST_RESOLVED_DATEMTTR_MINSDESCRIPTION
11/5/2016 15:5211/9/2016 6:445220System: CPU Utilization is 99.03% busy - User: 60.14% System
5/26/2017 16:165/31/2017 18:547380System: CPU Utilization is 99.59% busy - User: 99.24% System
9/27/2016 21:0810/2/2016 3:396180System: CPU Utilization is 99.97% busy - User: 3.10% System
9/29/2016 18:2410/2/2016 14:464080System: CPU Utilization is 99.80% busy - User: 98.92% System
9/22/2016 19:319/25/2016 15:294080System: CPU Utilization is 99.90% busy - User: 98.89% System
2 REPLIES 2
pedrodrfaria
13 - Pulsar

Hi @Pravallika20 

 

You are looking for a Predictive Model.

 

To predict the next date, you need to define the predictor variables, which columns should be used to predict the date column?

 

Do you have any experience working with predictive modeling?

danilang
19 - Altair
19 - Altair

Hi @Pravallika20 

 

5 points are not enough for any forecasting model.  If you had several thousand you could try to use a time series forecast like this. 

 

1) Use a Generate Rows tool to generate all valid dates for the last 5 years .

2) Join this data to your input and create a new field called event that is 1 if the event occurred on that date and 0 if it didn't.

3) Pass this to an ets tool with event as the dependent variable and look at the results.

 

Even once you do this, there's no guarantee of success.  This is due to the domain of the problem.  High server utilization is very rarely based on just time.  It's generally caused by other factors, number of processes running,  peak process utilization, number of concurrent users, etc.  These are factors that are not captured in your input data set, so even if time series analysis can find a pattern, you'll have no understanding of the cause of the problem

 

Dan 

 

 

 

 

 

 

Labels