This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 04-23-201910:40 AM - edited on 04-30-201908:55 AM by SydneyF
To get a better understanding of how to properly leverage a machine’s resources to use Alteryx, it can be very helpful to understand how the Alteryx Engine functions. To clear up any haziness surrounding the term “Alteryx Engine”, this article covers what happens when you click the Run Button, either in Alteryx Designer or in Alteryx Server:
This article covers how the engine processes each record and how it will utilize the machine’s core(s) and the machine’s RAM (memory) when a workflow is run.
Disk vs. Ram
The first thing to note is the difference between Disk and RAM. Executing processes is slower when done on Disk than when on RAM. With that in mind, the Engine will try to do it’s processing in RAM after it reads data from a Disk one record at a time. See below for the general flow of records through a workflow (Blue is on Disk, Red is in RAM).
Record 1 is read from Disk to RAM, then moved into the Formula tool before being released back to Disk in the Browse tool. Once this is complete, then the Engine will move onto Record 2 and so on.
Workflows with Multiple Streams
The next scenario to consider is a workflow with multiple streams. If there are two outgoing connections from the Input tool, will each record have to be read twice? See below for the flow of data in this scenario.
Once Record 1 is read from the Input tool, the Input tool will hold that record in RAM and process it until the Engine finishes the top stream and releases it to Disk. Then it will resume from stored record to process the second stream. This avoids extra processing by the Input tool by utilizing RAM storage.
Sort and Join Tools
The final scenario this article will cover is when the Engine hits tools like the Sort tool and Join tool. With something like the Sort tool, to sort all the records, the Engine needs ALL the records, not one at a time as seen in the previous scenarios. Let’s take the use case of the Join tool –
Each record is read from Disk and then stored in RAM before the Join tool.
Next, the records will be split up, to most efficiently sort the records, which is done so the Join can be processed most efficiently.
After the Left Input is stored and sorted, the same process will be done for the Right Input.
After all the records are sorted in RAM, the Engine can process by looking at the records one at a time as per the Join tool function. Since the records are sorted, the Engine can go back to running one record at a time until it is released onto Disk, and then proceed to the next record.
From this last example, the Engine is forced to use more RAM than normal because it needs to store records before processing. There are settings within Designer and Server that reference what is known as theSort/Join Memory, which is for these processes.
As seen on Designer under Options > User Settings > Edit User Settings
As seen on the System Settings on the Server under Engine > General
The Sort/Join processes are used by the Sort and Join Tools, as well as other tools, which are called Blocking tools. Blocking Tools require that all records are read into the tool before more processing can be done. You can see which tools are blocking tools by referring to the Alteryx’s Periodic Table of Tools.
With the base knowledge and understanding of how the Engine works, you can better leverage your machine’s resources and understand why certain processes might be taking longer than expected.
Better specs = better Alteryx-ing = better data solving!