Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
DavidHa
Alteryx
Alteryx

In 2020.2, Alteryx released the new Alteryx Multi-threaded Processing (AMP) Engine. The AMP engine is purpose-built to enable lightning fast analytic execution. There are many resources available to provide an overview of the AMP engine, its purpose, capabilities, usage, and recommendations. Here are a few I'd recommend starting with:

 

The purpose of this blog is to explore what kind of performance benefits one might see by switching to the AMP Engine!  Please note, this is NOT a performance benchmarking paper. These are observations meant to show you possible performance improvements that can be seen by using the AMP Engine. The exact performance differences will be determined by the tools used in the workflows, data sizes being analyzed by the workflow, and underlying hardware. 

 

 

The Environment

 

In order to ensure results were consistent and repeatable, I didn't want to use my laptop where results could be impacted by other applications. So I went to AWS and created two EC2 instances, each with 4 cores (8 vCPUs) and 16 GB of RAM. I installed Alteryx Server 2020.2 and configured them per the diagram below, with one machine serving as the Controller & Gallery, and the other machine serving as a dedicated Worker. This provided a controlled testing environment where the Worker was isolated, and the only variable being modified was from the E1 engine to the AMP engine. 

 

 

ENVIRONMENT.PNG

 

 

 

The Workflows

 

In order to evaluate different workflow patterns, I settled in on 3 different workflows, a traditional Prep & Blend workflow, a spatial analysis workflow, and a predictive model building workflow. 

 

 

Workflow #1 - Prep & Blend

 

The Prep & Blend workflow is a familiar one that joins two data sets then sorts and summarizes the output. These types of workflows are typically memory intensive as they require all records to be read in before the sorts, joins, or summarizations can be performed. 

 

PrepBlend_Workflow.PNG

 

 

Workflow #2 - Spatial

 

The spatial workflow uses some of the spatial tools, which can be CPU intensive. 

Spatial_Workflow.PNG

 

 

Workflow #3 - Predictive

 

The predictive workflow uses the R-based predictive tools to build two models (logistic regression and boosted), then uses the Model Comparison tool to determine the champion model.

 

Predictive_Workflow.PNG

 

 

The Results

 

The test executed each workflow using both the E1 engine and the AMP engine several times, ensuring consistently repeatable results. The average execution times are shown below. It should also be noted that the workflows when executed with E1 and with AMP produced the same outputs.

 

 

AMP_Results.png

 

 

The results show a staggering 98x faster execution time for the Prep & Blend workflow, a 5x faster execution time for the spatial workflow, and no change for the predictive workflow. These results will be explained in more detail below.

 

 

Prep & Blend

 

  • The Sort & Join tools with the E1 engine are singled threaded processes when they get to the final merge. This means on the machine with 8 logical processors, only one of them was doing any work. (12.5% CPU utilization). However, with the AMP engine, the Sort & Join tools are able to utilize all 8 of the logical processors, potentially increasing the amount of work we can accomplish by a factor of 8 (100% CPU utilization).


  • The AMP engine makes use of data much more efficiently through the sharing of records and use of 4MB packets. This can easily by seen by looking at the amount of data that passed through the Join tool in the E1 workflow execution (9.4 GB) compared to in the AMP workflow execution (152 MB). The AMP execution used only 1.57% of the data that the E1 engine used.

 

The E1 execution consumed 9.4GB of data to process the Join.The E1 execution consumed 9.4GB of data to process the Join.

 

 

 

The AMP execution consumed 152MB of data to process the Join.The AMP execution consumed 152MB of data to process the Join.

 

 

Spatial

 

The spatial workflow saw a substantial benefit from the AMP engine. This can be easily explained by looking at the list of converted tools on AMP. (Tool Use with AMP). The Spatial Info tool has been fully converted providing a multi-threaded execution benefit.  The Find Nearest tool has been partially converted as the drive time/distance calculations still use the original E1 engine. However, the configuration being used in this test was not using drive time so the full benefit of AMP was realized. 

 

 

Predictive

 

The predictive workflow execution time was the same with the E1 and AMP engines. This was expected as the predictive tools have not been converted to use the AMP engine. A majority of the predictive workflow execution time is consumed by the R processes, which are externally launched and executed outside of the control of the engine.

 

 

Summary

 

This article has shown some of the performance benefits that might be seen from using the AMP engine. The important takeaways here are that:

  • The most commonly used tools will perform best on AMP. See the Tool Use with AMP article for the full list. 
  • The benefit of AMP typically increases as data sizes become larger.
  • Mileage will vary based on data sizes, underlying hardware, and workflow construction.


Stay tuned for more entries in the AMP Engine Technical Deep Dive series from @AdamR_AYX.  If you have any performance observations from using the AMP Engine, please leave a comment below to share your feedback. We would love to hear of your results!

 

David Hare
Senior Manager, Solutions Architecture

David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.

David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.

Comments