Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Workflow processing 2+ days

trevorwightman
8 - Asteroid

Hi All,

 

I started a workflow just over two days ago and it has still not finished. I have pasted a snapshot of the workflow before, but essentially I start with 22.5m records and section off 150k for training. I sent this 150k records through 8 different predictive models, 2 models for each type: Decision Tree, Random Forest, Neural Network, and SVM. I merge and feed the scores of each of these to a lift chart so I can see which one performs the best. Additionally I took the output of the random forest (1000 trees) tool and merged it with the remaining 22.35m records and sent it through the score tool. After the score tool I did a small amount of manipulation on the data. I expected the workflow to take some time but didn't think it would take this long. Additionally, I do not even know how much longer it will take as the percentages on the predictive tools are not representative for how much is left to go. Any thoughts?

 

Last note: There is one error but I do not think this has affected the workflow negatively as I believe this is one of those fake errors.

"Error: Forest Model (35): Forest Model: The R.exe exit code (4294967295) indicated an error."

 

trevorwightman_0-1594124551161.png

 

3 REPLIES 3
PeterA1
Alteryx
Alteryx

Hey @trevorwightman At first glance what I can say is, the R tools are going to be somewhat resource intensive due to the fact that every single tool is using the external R program to execute. This could be using up a ton of resources at once and cause slower processing speed. I will brainstorm some creative ways of speeding this up potentially but in the mean time, are you able to attach the workflow package and maybe I can try running it? Do you have performance profiling turned on?

 

Best,

Peter

trevorwightman
8 - Asteroid

Hi Peter,

 

It may have been the case of me being overzealous with what Alteryx can handle. With the most common tools it is easy to spot where a backlog is because it will usually gain 1% at a time very slowly. Unfortunately for these tools it immediately goes to 69% even though time-wise it is not 69% done.

 

I created you a dummy data set so you would have something to play with, see attached. It's about 1,500 times smaller than my actual data set and it took alteryx just over 2 minutes to run. If my full dataset ran at the same proportional speed it would take 56 hours to complete (which would be right about now, but it is still running). I am afraid that with scale it might take (exponentially?) longer but do not want to stop the workflow yet since I have 2.5 days invested into it.

 

I do not have profiling on but when I ran it during my test I notices that the lift chart took the longest time (about 28%).

 

This is what my machine is currently running at 

trevorwightman_0-1594134620876.png

 

Here is the processor and RAM I am using.

trevorwightman_1-1594134671881.png

 

 

Let me know

PeterA1
Alteryx
Alteryx

Hi @trevorwightman, I spoke with one of my colleagues @AndrewKramer who informed me that for each R tool, the data is copied, so its very possible that this is eating up all of your RAM, causing spillage onto disk which is much slower. I will continue exploring ways to speed this up via a type of macro potentially.

Labels