Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Server Knowledge Base

Definitive answers from Server experts.

Alteryx Server System Settings Deep Dive - Worker

DavidHa
Alteryx
Alteryx
Created

Introduction

This is the second article in a series to explore the Alteryx Server System Settings in depth to gain a deeper knowledge of what these settings are used for, and to provide a bit more context to help you determine the appropriate settings for your environment.The first article in this series is: Alteryx Server System Settings Deep Dive - Engine. In this article we will focus on the Worker. The Alteryx Service Worker is responsible for executing analytic workflows, servicing Insights, and rendering map tiles. There must be at least one machine enabled as a worker to execute workflows through the Service.

We’ll explore the Worker settings in sections matching how they are displayed in the System Settings wizard.

Worker - General

Worker-Settings-1.png

Workspace

The Workspace is where the worker stores temporary or cache files, and unpackaged workflows for use when executing workflows. By default it is the same as the controller folder. This path should point to a location that is safe to store large amounts of files.

The same recommendations from the Engine “Temporary Directory” setting apply here:

Recommendations
  1. It is highly recommended to set the Workspace to a different drive than your system boot drive (C:\). If the C:\ drive on Windows fills up, the system can become unresponsive or even unstable. If an additional drive (D:\) fills up, no harm is done to Windows and the system will remain functional.
  2. The Workspace is used for frequent I/O operations to read and write data. Configuring this directory on Flash/SSD storage is a great way to improve performance by minimizing the wait times for those read & write operations to complete. See theMeasuring and Scaling a Private Serverblog post for more information.

Allow machine to run scheduled Alteryx workflows

Enabling this machine to run scheduled Alteryx workflows allows it to take requests to run workflows from the Scheduler or from the Gallery. In multi-node deployments, you may want to uncheck this option if you have another machine that will be running workflows, and want this machine to process map requests only.

Note, this setting needs to be enabled for the Worker to process either scheduled jobs, or manual jobs from users submitted through the Gallery.

Recommendations

  1. Ensure you have at least 1 Worker configured to run scheduled Alteryx workflows.
  2. In most cases all Workers should have this setting enabled. The exception would be if the Worker is to be used for Insights or Map Rendering only, which are covered below.

Workflows allowed to run simultaneously

This is the maximum number of scheduled workflows that are allowed to run simultaneously on this machine. You may want to increase this number to improve the responsiveness of scheduled jobs, but the overall processing time may be increased.

This is a very important setting for the success of an Alteryx Server deployment. The default value is 1. A higher value allows more workflows to run concurrently. However, it could also mean workflows take longer to process due to shared system resources. The blog post Measuring and Scaling a Private Serverdoes a great job at explaining this in detail.

The general recommendation is to set the number of Workflows allowed to run simultaneously to ½ the number of physical cores. Many cloud providers list the number of “vcpus” or virtual cpus associated with instances, which can be misleading. The correct way to identify the true number of physical cores associated with a Windows Server (whether physical or virtual machine) is the following Powershell command:

Get-WmiObject -Class Win32_Processor | Select-Object -Property Name, Number*

Example machine with 4 physical cores and 8 virtual/logical coresExample machine with 4 physical cores and 8 virtual/logical cores

There is no one size fits all recommendation here as the best results will depend on the data sizes, types of tools in the workflow, and underlying hardware. Below are some general recommendations.

Recommendations

  1. A good starting point is ½ the number of physical cores and this should be used in most situations.
  2. If jobs are queueing, but overall system utilization (CPU & Memory) is looking good, consider increasing the setting by 1 and observe the effects. If workflow runtimes are impacted or system utilization becomes too high revert to the original setting.
  3. If workflows are all In-Database then the Worker can likely handle a higher setting.
  4. In rare situations, it may be beneficial to leave the default of 1. An example might be when processing extremely large data sets with Spatial tools. In that scenario allowing the system to dedicate all resources to processing a single workflow may produce faster execution times.
  5. This setting should be in balance with the Engine Sort/Join Memory setting documented in the Engine Deep Dive Settings article.
  6. Review this configuration with your Alteryx representative to determine an appropriate value.

Maximum sort/join memory usage (MB)

This restricts the amount of memory Alteryx uses when encountering Sort or Join tools in a workflow. A general rule for an appropriate setting is to be ½ the amount of system memory available, divided by the number of simultaneous workflows allowed to run.

Note, this setting was removed in 2019.3 as seen from the screenshot above.

Cancel jobs running longer than (seconds)

If you do not want jobs to run for an extended period of time, use this setting to force jobs to cancel after a certain amount of time has passed. This helps free up system resources that might otherwise be taken up by unintentionally long running jobs. This setting only applies to scheduled jobs and does not affect manual runs from the Gallery.

This setting is disabled by default and should only be enabled to prevent long running scheduled Gallery jobs. Note, the checkbox must be selected and a valued provided to enable this setting. If the timeout is reached an error will be displayed:


Timeout1.PNG

Quality of Service (Job Priority)

In an environment where multiple workers are deployed, selecting a priority level can determine which jobs are run by each worker. For normal operation with one machine configured as a worker, set this value to 0.

The Server doc describes the use of this setting well under the “Job Priority” section:

When a job request is handled by a worker, it compares the priority level of the job to the Job Priority value for the worker. Jobs that have a value greater than or equal to the Job Priority for the worker will be handled by that worker. For example, if a worker has a Job Priority of 0 and is available, the worker will handle any request. However, a worker with a Job Priority of 3 will only handle jobs that have a value of 3 or higher. This allows resources to be reserved for higher priority requests.

Recommendations

  1. In a single Worker environment, leave the default value of 0.
  2. In an environment with a separate Controller, consider enabling the Controller also as a Worker and setting this value to 6. This will allow the Controller machine to process workflow validation requests only. Workflow validations occur when saving a workflow from Designer to the Server. Having the Controller process these jobs allows the Workers to remain focused on processing actual jobs, and ensures the workflow validations complete quickly since they aren’t queued up waiting for a busy Worker to finish processing a job.
  3. For more details and recommendations, see the Job Prioritization and Worker Node Assignment article.

Job assignment

A specific worker can be assigned to run a job. First, add a job tag for the worker, and then select that job tag when creating a schedule or running a workflow.

  • Run unassigned jobs: Select this option to use the worker to run jobs that have not been assigned a job tag.
  • Job tags: Add words that can be used to assign a specific worker to run a job. Separate multiple job tags with a comma. The same job tag can be added to multiple workers.

Recommendations

  1. At least one worker should have the "Run unassigned jobs" option checked. Otherwise any jobs that are submitted without a job tag will sit in the job queue indefinitely.
  2. If Job tags are used in a multi-Worker environment, consider placing each job tag on at least 2 Workers to eliminate a single point of failure.
  3. For additional details on these settings and Job tags recommendations, see the Job Prioritization and Worker Node Assignment article.

Worker – Run As …

Worker-Settings-2.png

Run as a different user

If a worker machine needs to run workflows that access files or data from a location that requires specific credentials to access it, the machine can be configured to run the workflows as a specified user or account. To have the machine run as a different user, enter theDomain,Username, andPassword.

If no specific credentials are provided, then by default workflows will be executed by the Alteryx Engine process using the SYSTEM user. Providing credentials allows the running workflow to access file locations that might be protected by specific permissions. It’s also possible to access databases that use trusted Windows Authentication.Any workflow credentials entered when executing from the Gallery, or default workflow credentials assigned to a workflow or subscription will override what is specified in this setting. The article How Workflow Credentials Work on a Private Gallery explains this setup in great detail.

Worker - Mapping

Worker-Settings-3.png

Allow machine to render tiles for mapping

Enabling this machine to act as a Map Worker will allow it to render map tiles for Map Questions and the Map Input Tool. In multi-node deployments, you may want to uncheck this option if you have another machine that will process map tile requests, and if this one will be dedicated to running scheduled workflows.

When Map Questions or Map Input Tools are processed in the Gallery, the map tiles are built via a Map Render Worker. At least one Worker must be configured as a Map Render Worker for map tiles to be built. The map tiles are cached for faster response times on the Controller machine according to the Controller’s Mapping settings.

Max number of render workers

You can specify the number of processes to be used for map tile rendering. The more processes allowed, the more simultaneous tiles can be rendered, but it will take up more system resources.

For a configured Map Render Worker, you can control the number of processes that can render Map tiles. The default setting is 2, and the maximum is 10. This configures the number of Map Render Worker processes as shown below:

MapRenderWorkers.png

These processes are idle taking up very little resources until map tiles are requested. When map tiles are requested, they can consume a small amount of CPU and memory and then quickly release those resources when the job is complete.

Recommendations

  1. Configure at least 1 Worker to act as a Map Render Worker in the event map tiles are requested.
  2. Keep the default value of 2 Render Worker processes unless heavy map tile usage is needed.
  3. If extensive Map Questions and Map Input Tools will be used, consider configuring a separate Worker just as a Map Render Worker with a large number of Render Worker processes, but not to process Scheduled jobs or Insights. This will allow the machine to dedicate all resources to rendering map tiles and not conflict with running workflows.
  4. If map tiles that have been previously loaded are taking a long time to load, consider increasing the Memory Cache and Disk Cache sizes on the Controller Mapping settings.

Worker - Insights

Worker-Settings-4.png

Enable Insight Worker

The machine can be configured to act as an Insight Worker and render insights, which are interactive dashboards created in Alteryx Designer and published in a Gallery.

In order to execute and display Insights, at least one Worker must be configured as an Insight Worker. Unlike Map Rendering where the Worker processes the Map Tiles and the Controller caches them, with Insights the Worker performs both duties. The Worker both loads the Insights and caches them for fast response times on future requests to the same Insight.In the case of a multi-Worker deployment, if an Insight is cached, all future requests for the Insight will be processed by the Worker which has cached that Insight.

Recommendations

  1. In a multi-Worker environment, configuring one Worker as an Insight Worker is probably sufficient.
  2. If heavy Insight usage is expected, multiple Insight Workers are recommended.

Insights allowed to run simultaneously

The maximum number of insights that can run simultaneously on the machine. The more insights that can be run simultaneously, the more system resources used.

Insights are built and cached via python processes. This setting controls the number of python processes which can run simultaneously. The minimum is 2 and the maximum is 10. Example processes are below:

Insights.png


Note, these python processes are only launched if an Insight is requested. Each Insight process may consume roughly 64 MB of memory.

Recommendations

  1. In most situations the default value of 2 should be used.
  2. If Insights are going to be used frequently, the total number of Insights allowed to run simultaneously across all Insight Workers should be equal to the number of expected concurrent Insight requests from Gallery users.

Max Cache Size (# of Cache Directories)

The maximum number of insights cached on a worker machine. Each insight consists of a description and data file, so each insight cache is a directory that contains those files.

The default value is 20. This means that up to 20 Insights can be cached on this Worker and then quickly loaded for future requests. The maximum value allowed is 100. If you need to cache more than 100 Insights then multiple Insight Workers are needed.

Recommendations

  1. The sum of the Max Cache Size across all configured Insight Workers should be greater than the number of Insights in the Gallery. This ensures that all Insights can be cached for fast response times.

Max Port,Min Port

The range of port numbers designated for use when rendering insights.

Each of the python processes needs a port for passing the Insight to the Gallery. This controls the range of ports to allow. The default range is 100. However, only up to the number of configured Insights allowed to run simultaneously will ever be used. So for example in a default configuration with 2 Insights allowed to run simultaneously, only ports 8700 and 8701 would be used.

Conclusion

An Alteryx Server Worker has many settings that can impact behavior, user experience, and resource consumption. Understanding how each of these settings are used by the Server, and how some settings even impact others, will allow administrators to configure Alteryx Server optimally for their environment and usage. It’s important to note that any changes to the System Settings will perform an Alteryx Server restart on the respective machine. So, plan any changes carefully.

References

Comments
jacob_kahn
12 - Quasar

Thanks @DavidHa for sharing this article on the community.

 

I'd also really be happy to see an article that describes how all the System Settings work together - I have a piece-meal-like understanding of how all the settings interact together to complete the set up of the Server, but I wish I could find like a logic-map of some sort to clarify my understanding.

 

Sincerely,

the_jake_tool

adansantos
6 - Meteoroid

Thank you @DavidHa,

 

Very good article and good recommendations.

 

Congratulation!!!

Ilona_V
5 - Atom

Thank you, well explained :)