In 2020.2, Alteryx released the new Alteryx Multi-threaded Processing (AMP) Engine. The AMP engine is purpose-built to enable lightning fast analytic execution. There are many resources available to provide an overview of the AMP engine, its purpose, capabilities, usage, and recommendations. Here are a few I'd recommend starting with:
The purpose of this blog is to explore what kind of performance benefits one might see by switching to the AMP Engine! Please note, this is NOT a performance benchmarking paper. These are observations meant to show you possible performance improvements that can be seen by using the AMP Engine. The exact performance differences will be determined by the tools used in the workflows, data sizes being analyzed by the workflow, and underlying hardware.
In order to ensure results were consistent and repeatable, I didn't want to use my laptop where results could be impacted by other applications. So I went to AWS and created two EC2 instances, each with 4 cores (8 vCPUs) and 16 GB of RAM. I installed Alteryx Server 2020.2 and configured them per the diagram below, with one machine serving as the Controller & Gallery, and the other machine serving as a dedicated Worker. This provided a controlled testing environment where the Worker was isolated, and the only variable being modified was from the E1 engine to the AMP engine.
In order to evaluate different workflow patterns, I settled in on 3 different workflows, a traditional Prep & Blend workflow, a spatial analysis workflow, and a predictive model building workflow.
The Prep & Blend workflow is a familiar one that joins two data sets then sorts and summarizes the output. These types of workflows are typically memory intensive as they require all records to be read in before the sorts, joins, or summarizations can be performed.
The spatial workflow uses some of the spatial tools, which can be CPU intensive.
The predictive workflow uses the R-based predictive tools to build two models (logistic regression and boosted), then uses the Model Comparison tool to determine the champion model.
The test executed each workflow using both the E1 engine and the AMP engine several times, ensuring consistently repeatable results. The average execution times are shown below. It should also be noted that the workflows when executed with E1 and with AMP produced the same outputs.
The results show a staggering 98x faster execution time for the Prep & Blend workflow, a 5x faster execution time for the spatial workflow, and no change for the predictive workflow. These results will be explained in more detail below.
The spatial workflow saw a substantial benefit from the AMP engine. This can be easily explained by looking at the list of converted tools on AMP. (Tool Use with AMP). The Spatial Info tool has been fully converted providing a multi-threaded execution benefit. The Find Nearest tool has been partially converted as the drive time/distance calculations still use the original E1 engine. However, the configuration being used in this test was not using drive time so the full benefit of AMP was realized.
The predictive workflow execution time was the same with the E1 and AMP engines. This was expected as the predictive tools have not been converted to use the AMP engine. A majority of the predictive workflow execution time is consumed by the R processes, which are externally launched and executed outside of the control of the engine.
This article has shown some of the performance benefits that might be seen from using the AMP engine. The important takeaways here are that:
Stay tuned for more entries in the AMP Engine Technical Deep Dive series from @AdamR_AYX. If you have any performance observations from using the AMP Engine, please leave a comment below to share your feedback. We would love to hear of your results!
David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.
David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.