Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Time Series (ARIMA) and Run Time Limits (Server)

dev_dixitt
6 - Meteoroid

Hello all, 

 

Context: 

 

Overall objective was to automate time series forecast across multiple products. 

Have around 1000 Products that need to be forecasted.

 

I have developed a workflow that provides a combination of parameters 'd' (Degree of first differencing) and 'D' (Level of seasonal differencing) or in other words a Grid to the ARIMA tool that updates the combination through control parameters.

 

I have 6 combinations of 'd' & 'D'. A batch macro runs and provides ARIMA forecast for all of the 1000 products for each of the combination. To rephrase, ARIMA runs 1000 x 6 = 6000 times. 

 

Challenge: 

 

Our enterprise IT that handles Alteryx Server has put a 2 hour time limit for a workflow run time and kills anything that goes above that time limit.  

The workflow I mentioned above took 34 hours on Designer on the machine to run. RAM 16 GB

 

Even if the server runs it faster, I don't see it coming down to below 2 hours. 

Points where we could use guidance and ideas.

  • Breaking the workflow down into multiple tiny workflows : Will require a lot of development effort and can't afford to delay the project any further. So a NO NO
  • Get a virtual machine / virtual pc and have Designer run on that (I've been told it could be a breach of contract with Alteryx) can someone throw some light on this ? 

How can we solve for this? 

 

 

 

I am attaching an example of my use case with just 1 product. 

Any kind of help will be really appreciated. 

 

Thanks ,

Dev

3 REPLIES 3
blakecasey
Alteryx Alumni (Retired)

Hi Dev,

 

My suggestion would first be to see what can be done to optimize this workflow. I can imagine that committing a worker to running one workflow for 34 hours can present some challenges. Based on the workflow that was attached, I would suggest one strategy to reduce runtime would be to utilize AMP engine. There are processes that would benefit greatly from some downstream multi-threaded processing. A workflow that has a considerable number of Union and Join tools are typically good candidates for AMP, but here is a link to one of our Knowledgebase articles containing more information on when to use AMP, including some best practices.

 

Additionally, it would require some workflow modification and development time and effort to accomplish, but using a batch macro to run ARIMA 6000 times also sounds less than ideal. This will likely not come as a surprise, but upon enabling performance profiling and running the provided workflow, a large amount of the execution time is spent running the macro (containing ARIMA, which calls the predictive kit - specifically the R tool). Rather than breaking the workflow down into multiple tiny workflows, explore whether the workflow can function at an acceptable level where the workflow doesn't require ARIMA to run 6000 times.

 

While this might not be able to bring the workflow down to 2 hours, perhaps if we can get closer to that number, there may be a decent argument for your internal server admins to compromise on increasing the runtime limit slightly and scheduling the workflow during a time that is less busy, so as to not cause other daily running workflows to queue up and cause congestion. 

 

Using a VM isn't an issue by itself, however, the issue is if one were to use a temporary VM that is unlicensed, or a server with more cores than what is licensed for - these setups are not supported.

 

Other suggestions would involve scaling out server and utilizing job prioritization and worker tags to accommodate for long running workflows like these. This may be a more difficult conversation to have depending on the circumstances of your organization, but felt that it was nonetheless worth mentioning. I hope that makes sense and that it helps.

IraWatt
17 - Castor
17 - Castor

hey @dev_dixitt

I have no idea how optimised this tool is but the TS Model Factory Tool | Alteryx Help could possibly generate models more efficiently then a batch + ARMIA model

leozhang2work
10 - Fireball

So quite a bit can be discussed here.

 

TL;DR

Overall:

1: Try Model Factory tool (maybe with AMP)

Individually:

2: Reduce the full enumeration of models as that takes most of your time.

3: Decide differencing prior from study of plots.

4: Drop any non-necessary graph and browser output. 

 

First part, the theory. I do take the side differencing should be done prior rather than as a free parameter to test out, They offers a very different understanding to your data, if you are trying to maximise accuracy without care the model, I do believe you are going to overfit. 

 

Option to vary only p, q for a reasonOption to vary only p, q for a reason

 

Second, speed.

 

I do hope you have a very good reason for this particular combinationI do hope you have a very good reason for this particular combination

The full enumeration is causing you most of time spent here rather than batch on differencing. I only ever went for 10 when I feel greedy.

Pretty much all p, q, P, Q are less than 2Pretty much all p, q, P, Q are less than 2

I really don't see the reason of choosing 8 here. Rather than let p handle 8 steps, I would recommend to try with P to see if it handles internal seasoning. Have you done ACF and PACF plot to see what your data behave like?

 

Third, some small adjustment.

 

As you are not looking for the model output, no need to forecast 24 periodsAs you are not looking for the model output, no need to forecast 24 periods

Given your sample data here is ~100 rows, I wouldn't go for more than 10% of it in forecast. Default is 6 for a reason, run it more often rather than forecast a long period is a better choice. Unless you have a very well behaved time series, otherwise it is really bad idea to forecast too many periods.

 

Optional, thing you could do.

You could remove the reporting if you don't want to  see the outputYou could remove the reporting if you don't want to see the output

I see you modified the ARIMA tool, you could bin all the reporting tools here, just get the model objects. Less rendering will probably save your time as well.

 

Four, a side note as well, do put some kind sort before your multi-row formula, I do worry AMP engine will affect that if not careful.

Labels
Top Solution Authors