One of the biggest problems I see when working with Alteryx Server customers is helping them understand how many jobs can run at a time (# of Simultaneous Workflows) and what number makes sense for their environment. This setting is documented as "Workflows allowed to run simultaneously"
The recommendation above is a starting point. There are more details about this setting with additional recommendations in the Worker System Settings Deep Dive article. What we find is that often times the value is set too low, jobs are queueing, and resources are being left on the table unused. Or alternatively, the value is set too high causing the system to become overloaded and even jobs to fail.
The following article is NOT a performance benchmarking paper. These are observations meant to show you the implications of this setting and to encourage you to perform your own testing and research with relevant workflows to understand the optimal # of Simultaneous Workflows for your environment.
My laptop didn't make for the most realistic test environment, so I went to AWS and created two EC2 instances, each with 4 cores (8 vCPUs) and 16 GB of RAM. I then installed Alteryx Server 2019.4.4 and configured them per the diagram below, with one machine serving as the Controller & Gallery, and the other machine serving as a dedicated Worker. This allows us to configure the dedicated Worker machine to allocate all resources to running jobs.
A quick word of caution when working with AWS. An EC2 instance listed as 4 vCPU (such as the m4.xlarge) typically means it only has 2 cores. Alteryx has a 4-core minimum requirement so I went with the c5.2xlarge. Information on AWS physical cores can be found here.
We can see from this output that the c5.2xlarge has 4 Cores with 8 "Logical Processors" or threads.
No tuning was performed. All Alteryx system settings were kept default with the exception of the Logging Level was set to Normal as opposed to High. The Worker setting "Workflows allowed to run simultaneously" and the Engine setting "Default Sort/Join Memory Usage" were modified appropriately for each test, as described in the test section.
As mentioned in the introduction, this type of analysis only works if relevant workflows are used. So as a simple test, I used three different workflows to simulate various workflow patterns. Prep & Blend, Spatial, and Predictive.
The Prep & Blend workflow is a familiar one that joins two data sets then sorts and summarizes the output.
This type of workflow is of particular interest since the Join, Summarize, and Sort tools require all data to be read in to process, meaning these types of workflows can consume large amounts of memory, and potentially a lot of disk I/O to the Engine Temp directory (swapping) if the memory needed is more than the Sort/Join memory setting. How much memory is needed can be roughly determined from running the workflow in Designer and observing the largest value displayed:
The Spatial workflow uses some of the Spatial tools which can be CPU intensive.
The Predictive workflow uses the R-based Predictive tools to build two models (Logistic Regression and Boosted), then uses the Model Comparison tool to determine the champion model.
The R-based Predictive tools are an interesting case since they launch additional processes outside of the Alteryx Engine process. These additional processes can consume extra CPU resources and Memory beyond any limits applied to the Engine via the Sort/Join Memory or Number of Threads settings.
Here's an example to illustrate this, where I had 3 Predictive workflows all running concurrently. Each has a corresponding Rterm and Rscript process. The Rscript processes are consuming 36% of the CPU and 4.4 GB of Memory.
The test was to observe the Average Workflow Execution Time, and the Time to Complete All 60 Workflows as the number of Simultaneous Workflows is increased. Why 60? Two reasons:
To get 60 workflow executions with equal weighting from the Prep & Blend, Spatial, and Predictive workflow types, I queued each one then looped and repeated for a total of 20 iterations. All jobs were added to the queue at once with automation. So the queue looked like this...
Predictive - job 20
Spatial - job 20
PrepBlend - job 20
...
Predictive - job 2
Spatial - job 2
PrepBlend - job 2
Predictive - job 1
Spatial - job 1
PrepBlend - job 1
At each Simultaneous Workflow count, I configured the Engine Sort/Join Memory setting based on the following recommended equation for a dedicated Worker, which is covered in extensively in the Engine System Settings Deep Dive article.
The Total amount of RAM can be found via this Windows command:
So for the c5.2xlarge, the recommended Sort/Join Memory setting values would be:
Simultaneous Workflows | 1 | 2 | 3 | 4 | 5 | 6 |
Sort/Join Memory (MB) | 11,704 | 5,852 | 3,901 | 2,926 | 2,341 | 1,951 |
All results have been normalized to the Simultaneous Workflow = 1 result.
Observations
The results show that for THIS environment, and THESE workflows, 3 Simultaneous Workflows was the most efficient for overall throughput. Going beyond that produces diminishing returns, overloads the systems' resources, and makes individual job execution times longer than necessary.
What is clear from these results is that increasing the # of Simultaneous Workflows from the default value of 1 MAY increase the total amount of jobs that a Server can execute in a period of time (throughput). It will however likely come at the cost of individual workflow execution times being longer compared to if only one job was running. That is a trade-off that must be understood and evaluated. Setting the value too high can lead to a scenario where overall throughput is reduced and individual job execution times are increased to the point they run much longer than necessary.
The results show that for this environment and workload, the default recommendation for # of Simultaneous Workflows = "(# of Physical Cores) / 2" would be a great starting point. It's possible for a dedicated Worker like this, perhaps ((# of Physical Cores) / 2) + 1 is reasonable as well.
The important take away here is that each organizations' environment, workflows, and data sizes will vary and that conducting your own research, evaluations, and analysis will lead to the environment that works best for you!
David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.
David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.