This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Alteryx Server System Settings Deep Dive - Engine
This is the first in a series of articles to explore the Alteryx Server System Settings in depth to gain a deeper knowledge of what these settings are used for, and to provide a bit more context to help you determine the appropriate settings for your environment. Every organization's deployment, use cases, and business requirements are different and thus there is no single configuration to fit all. Having an understanding of the implications of each setting is vital in setting up your Alteryx Server for success.
We’ll explore the system settings for Alteryx Server as of 2019.2, through a series of articles for each component, starting with the Engine. The Alteryx Engine consumes Alteryx workflows and provides high-speed data processing and analytics functionality. This process can be entirely self-contained in Alteryx Designer, scaled across an organization by the Alteryx Service, or deployed in the cloud by the Alteryx Gallery.
The Alteryx Server System Settings configuration wizard allows end users to modify the Engine configuration settings:
Any changes to the configuration settings in the System Settings wizard are applied to the RuntimeSettings.xml file, found at:
Note, this file should never be modified through an editor unless specifically requested from Alteryx Customer Support.
A few of these settings are self-explanatory and sufficiently described in the Help. However, many of these can often be confusing, and some have performance implications that aren’t well understood. The subsequent sections will dive into these settings in more detail.
“The Engine Temporary Directory is the place where temporary files used in processed workflows and apps will be placed. This setting should point to a location that is safe to write large amounts of files.”
The Engine Temporary directory is a parent directory in which each workflow executed creates a sub directory for storing data needed during the processing of the workflow. Below is not an exhaustive list, but includes the most common types of data stored in the Engine temporary directory:
Browse Tool Support - Every Browse Tool in the workflow creates a separate Alteryx Database file (.yxdb) that stores the contents of the data you see in the Browse’s Results and Configuration windows. The Browse Tool allows you to view all the data as opposed to a sample, and therefore the created .yxdb files could be quite large if the data sets being processed are large. Therefore, the number of Browse Tools should not be prolific. NOTE: The temporary .yxdb files are only created when an Alteryx Workflow is run in Designer. When workflows are executed through the Gallery or a schedule, browse tools are disabled.
Browse Everywhere Support – The Browse Everywhere features allows you to click on the output anchor of a given tool and see a sample of the data at that point in the workflow. All the data viewable in the various output anchors across a workflow is stored in one Alteryx Browse Everywhere file (.yxbe). The size of this file is determined by how many tools are in the workflow multiplied by the “Memory Limit Per Anchor” size which will be discussed later in this article. NOTE: The temporary .yxbe file is only created when an Alteryx Workflow is run in Designer. When worfklows are executed through the Gallery or a schedule, Browse Everywhere is disabled.
Tool Specific Support – Some tools create temporary files as part of their processing, for example the Download and Spatial Match tools.
Paging / Swap Space – When the Engine's memory processing requirements exceed that of the “Default sort/join memory usage” setting (covered later in the article), temporary files (.tmp) are created to retain data that can be quickly retrieved later. The number and size of these temp files is determined by the size of the data sets being processed, and the types of tools in the workflow. The Engine 101 Basics blog does a great job of describing how certain “blocking” tools need access to all rows of a dataset to process it. The presence of these blocking tools will drive up the likelihood that the Engine will need to use temporary files to complete its processing.
Example Engine Temp directory for a running workflow where the files described above are evident.
These files are all deleted when the workflow processing is complete. For a Designer user, this means when the Workflow is closed or upon initiating another execution of the Alteryx Workflow in Designer. For a Server user, this means upon completion of the workflow execution. Note, the Engine Temporary Directory is used when running workflows from Alteryx Designer. When running on an Alteryx Server, the Worker’s “Workspace” directory is used to store these temporary files.
It is highly recommended to set the Temporary Directory to a different drive than your system boot drive (C:\). If the C:\ drive on Windows fills up, the system can become unresponsive or even unstable. If an additional drive (D:\) fills up, no harm is done to Windows and the system will remain functional.
The Temporary Directory is used for frequent I/O operations to read and write data. Configuring this directory on Flash/SSD storage is a great way to improve performance by minimizing the wait times for those read & write operations to complete. See the Measuring and Scaling a Private Server blog post for more information.
When running in Designer, minimize the use of Browse Tools when working with large data sets to reduce workflow processing times and keep the Temp directory from filling up.
Understand that temporary directories can grow quite large during processing if working with large data sets and doing lots of sorts, joins, or other blocking operations. As an example, we saw a temporary directory grow over 150GB in size processing a 12GB data set. Your mileage will certainly vary based on data set sizes and tools used.
Memory Limit Per Anchor
“Define the maximum amount of memory to use to consume data for each output anchor for tools in a workflow. The default value typically does not need to be changed.”
This setting applies when running in Designer only as mentioned in the Browse Everywhere section. The Browse Everywhere feature allows a user to see a sample of the data (up to Memory Limit Per Anchor in size) at the output anchor of every tool. The default value is 1024 KB (1MB).
This workflow would produce a .yxbe file of approximately 12MB.
This is a great way to analyze your data without using the Browse tools which will write the entire dataset contents to disk.
A sample of the data can be seen in the output anchor of each tool.
In the event of the Temporary Directory not having enough space to create the .yxbe file, the following will be logged:
Warning - Alteryx: Disk space on temp drive running low. No browse everywhere data created.
Keeping the default Memory Limit per Anchor setting makes sense for most scenarios.
In situations where you are building a workflow with a very large data set, and the 1 MB sample just isn't enough to give you the information you need, but the full Browse tool is too heavy, it might make sense to increase this value. However, make this change on the local Designer (laptop/desktop) and not the Server which is used by many end users. This change can be made in Designer under Options -> User Settings -> Edit User Settings:
Designer users can override the Memory Limit per Anchor in the Advanced User Settings.
Default sort/join memory usage
"This is the minimum amount of memory that the Engine will consume while performing operations such as Sorts and Joins within a workflow or app. Generally, this value should not be changed.”
The Engine "Default sort/join memory usage" setting and the Worker “ Maximum sort/join memory usage ” setting have generated a lot of questions that I hope this section clears up. There are three key points to clarify:
If making changes, the Engine "Default sort/join memory usage" is the setting that should always be specified and what we'll focus on in this article. For Designer users, this value can be specified in the Options --> User Settings --> Edit User Settings wizard: The Default Dedicated Sort/Join Memory Usage setting
This setting is both a minimum and a maximum.
It's a minimum in that this amount of memory will be "committed" to each Engine process (running workflow) regardless of how much memory the Engine actually consumes processing the workflow. Therefore this amount of memory is reserved for each Engine process and not usable by other applications.
It's a maximum in that the Engine process will not consume more memory than this value. If the Engine needs more resources than the setting specified, it will start utilizing the temp (swap) space as described earlier. If the workflow launches additional processes, such as from the Run Command tool, or R by the use of Predictive Tools, those processes are NOT controlled by this setting.
Other processes are not controlled by the Engine Default Sort/Join Memory Usage setting
The setting is not specific to Sorts and Joins. As described in the Engine 101 Basics blog, “blocking” tools require access to all data for execution. This drives up memory consumption. Sort, Join, and Summarize operations are the most commonly used blocking tools but there are many others, easily identifiable with a red border in the Periodic Table of Alteryx Tools . Any workflow executing these blocking tools with high row count data sets (millions of rows) will consume lots of memory.
If the Sort/Join memory setting requested exceeds the amount of physical RAM available, Alteryx will revert to a lower value that is safe to commit. In that case a message like the following will be logged in the Alteryx Engine logs: 00:00:0.003 - Alteryx: Allocating requested dedicated sort/join memory would be more than available physical memory. Reverting to 2912MB of memory.
Only make changes to the Engine "Default sort/join memory usage" setting if necessary. The default value works well for most scenarios, and we routinely see problems occur when this value is changed without understanding when and how to configure it.
The below recommendations are just starting points. It is always recommended to configure and then test with representative workflows and usage patterns. Then reconfigure and repeat the process until the optimum values are identified.
To properly calculate a reasonable Sort/Join Memory value, we must know how many workflows will be configured to run simultaneously. This number should be determined by a server sizing exercise, and for existing customers, also take into account data from the Server Usage Report, such as queue times and job execution times. For Designer only users the number of simultaneous workflows will be 1.
For Alteryx Server machines that act as both a Worker and a Controller with the Embedded MongoDB, a good starting point is:
For standalone Workers, more memory can be allocated to running workflows. In that case a good starting point would be: The 4GB reservation ensures the OS and other system services are not starved of memory.
If predictive tools will be frequently used then lower your calculated values from above as additional memory should be reserved for the R processes.
The expectation is that the machine will only be used for Alteryx processing and not shared with other applications. If other applications will also be running, then their memory requirements need to be factored into the equations above.
Again, the above recommendations are just starting points. It is always recommended to configure and then test with representative workflows and usage patterns. Then reconfigure and repeat the process until the optimum values are identified.
Default number of processing threads
“Define the number of processing threads tools or operations can use. The default value is the number of available processor cores plus one. Generally, this value should not be changed.”
This setting is determining the number of processing threads that multi-threaded tools can use. The multi-threaded tools are identifiable in the Periodic Table of Alteryx Tools . (Sort and some Spatial tools highlight the list). Configuring a higher value here may facilitate more parallelism which may result in faster completion times in the execution of these multi-threaded tools, assuming the machine has the capacity to use the specified number of threads. Specifying a higher than needed thread pool size can lead to an over-committed system in which the CPU is constantly context switching between threads which may produce longer processing times.
In Windows Task Manager, the "Logical processors" metric shows us the maximum number of concurrent processing threads on that server:
To reduce workflow run times using multi-threaded tools, set the Default number of processing threads equal to the number of Logical processors.
If the server will be executing multiple workflows simultaneously, consider using a lower value to ensure that no workflows are starved of CPU.
Run engine at a lower priority
“Select if you are running other memory intensive applications simultaneously. It is also recommended that this setting be checked for a machine configured to run the Gallery.”
The Windows Scheduling Priorities doc is a great resource to understand why this setting is important. To summarize, Windows assigns time slices of CPU time to processes based on their priority level. Applications with a higher priority will be given more CPU time than applications with a lower priority. This ensures higher priority applications get more processor time when the system is heavily utilized.
Most applications default to “Normal” priority. Some critical Windows processes have a “High” priority, such as the logon and desktop window manager processes. Alteryx installs all Server components (AlteryxService, Gallery, Designer, etc…) with the “Normal” priority. By default, this includes the Engine as well. This could lead to a scenario where resource intensive workflows are running, and the Alteryx Service layer, the Gallery, or even Designer are struggling to get CPU time since they all share the same priority level. This also could inhibit mouse & keyboard inputs, and background disk flushing.
Running the Engine at a lower priority will allow these components to remain responsive even during periods of heavy workload processing.
This setting should be enabled in all Server deployments (Controller, Worker, Gallery) to ensure Windows components, and the Alteryx Server components always get priority over intensive Engine processes.
The Alteryx Engine has many settings that can impact workflow performance and resource consumption. Understanding how each of these settings are used by the Engine, and how some settings even impact others, will allow administrators to configure Alteryx Server optimally for their environment and usage.
How To: Adding a Worker Node
Many customers need to expand their Alteryx Server environment to support a growing number of users, additional departments, increased data sizes, frequent job queuing, or a myriad of other reasons. This document walks through the process of adding a Worker Node to an Alteryx Server environment. These steps apply regardless of the size of the existing Alteryx Server environment.
Product - Alteryx Server
Additional required assets:
Alteryx Server and Data installs
Provision a new physical or virtual machine meeting the Alteryx Server minimum requirements, which are 4 cores, 16GB RAM, and 1 TB Disk. It is recommended to w ork with your Alteryx representative to understand what is an optimal system setup for your use cases, data set sizes, and business requirements.
Install applicable ODBC Drivers and configure System or User DSNs to match the existing Worker(s). In-Database connections should also be created for any data sources where In-Database processing is being used.
Install Alteryx Server and any Data packages using the same install files from your existing Alteryx Server environment. This ensures version compatibility between the Controller and the new Worker Node. There are a couple of ways to verify the same version is being used.
Check the version reported in Gallery at https://MyGalleryHostname.com/#!help/version
Check the version reported by Alteryx Designer running on the existing Alteryx Server machine.
Ensure this matches the version reported by the file name of the Alteryx Server installable on the new Worker node.
On the Alteryx Server Controller machine, open the System Settings and navigate to the Controller -> General section. Copy the Controller Token and save it for step 6.
On the new Worker Node where Alteryx Server has been installed, launch the System Settings wizard. For the Environment Setup Type, choose Custom - Enable Worker.
Continue to the Controller -> Remote section. Enter your Controller Hostname and paste the Controller Token saved in step 4. Click Test to ensure the Alteryx Server version, Controller Hostname and Token match.
Configure the Worker and Engine settings as desired. Typically this involves matching the settings of an existing Worker but there are situations where it makes sense to have different settings across Workers, such as the number of workflows allowed to run simultaneously, QoS settings, Job tags, Map Rendering, etc... An example Worker Configuration
Complete the System Settings wizard and validate the AlteryxService is Running on the new Worker node.
Validate the new Worker shows up in the Gallery Admin -> Diagnostics page in the Workers panel.
Ensure the new Worker has access to any locally stored Macros or data sets. It is recommended to move all locally stored Macros and data sets to a network share or similar storage device that is accessible by all Workers, and then update Workflows based on the new locations. Storing files locally and copying to all Workers isn't recommended as it requires a method to keep them all in sync, increases data storage requirements, and introduces version differences across the environment.
Finally, run a few representative workflows through the new Worker to verify everything is working as expected.
With just a few simple steps a new Worker node can be added to an Alteryx Server environment to provide redundancy, higher throughput, shorter job queue times, and many more benefits.
Worker System Settings
Considerations for Scaling Alteryx Server
Measuring and Scaling a Private Server
Scaling a Private Server: Five steps to greater throughput
SAML (Security Assertion Markup Language) is a standardized way for exchanging authentication and authorization credentials between different parties. The most common use for SAML is in web browser single sign ons. Starting in 2018.2, Alteryx Server supports SAML. So far, SAML in Alteryx Server has been specifically validated on two providers; Ping One and Okta. In this article we will review how to configure SAML on your Alteryx Server for PingOne.
How To: Enable MongoDB logs in RuntimeSettings.xml
When trying to troubleshoot Server/Gallery issues, it can be useful to gather logs to determine if the cause is with your MongoDB. These steps will show you how to enable Mongo logging.
ALWAYS BACK UP FIRST. Make sure to make a copy of your RuntimeSettings.xml as well as your MongoDB. See this article for Backup & Recovery Best Practices
Stop your Alteryx Service
Open File Explorer and navigate to %PROGRAMDATA%\Alteryx
Open RuntimeSettings.xml in a text editor
Under the Controller section add EmbeddedMongoDBLogPath as a key
Add a directory as the value with .txt file extension.
Start your Alteryx Service and Mongo logs should be generating