How should I scale Alteryx Server? This is probably one of the most frequently asked questions we get regarding the Alteryx Server. There are some great blogs on the Alteryx Community that discuss different facets on the topic of scalability – one of my favorites is from Steve Ahlgren on the engine team titled, “Measuring and Scaling a Private Server”. It is a great blog that discusses the performance and throughput of the Alteryx Server. But if you are new to Alteryx Server, you might not be aware of the flexibility the Server has and all the options available when it comes to scalability, and that’s what I’ll cover in this post.
If you are new to Alteryx Server, it was designed to be a secure and scalable platform for sharing analytics and to empower everyone in your organization to make data-driven decisions while ensuring analytic governance. We’ve got a great 5 minute demo if you want to learn more. But before we jump into scalability, let me provide some additional background…
One of the great things about Alteryx Server, is that it is easy to set up and configure on-premises or in the cloud, and it can be set up and deployed within the same day. And once you deploy it, you can set up the scheduling and automation, use our Alteryx APIs to extend processes into other workflows, share applications and much more. It’s truly the fastest and easiest way to deploy analytics across your organization.
The Alteryx Server includes three main components: A scheduler for workflow execution, a Gallery for sharing and collaboration and a service layer to process the workflow. All of these components are typically installed on a single machine. But, as workloads increase, you can scale Alteryx Server across multiple machines to accommodate for higher levels of usage as well as for availability.
Both scalability and availability should be considered for flexibility in the future. The amount of hardware and software you need now might change in the future as demand increases. After Alteryx is deployed, you can use software-monitoring tools to alert you when certain components of your system are near or at capacity and when scaling the solution is needed. By implementing recommended IT practices, you can increase the availability of key components in the Alteryx Server and minimize both planned downtime maintenance, service pack installations, as well as unplanned downtime such as downtime caused by a hardware failure.
With availability, the goal is to compare the costs of your current IT environment and the actual cost of downtime vs. the cost of a high availability solution. Once you have determined this tolerance, your IT managers can use these numbers to make a decision and how high of an availability solution your organization should have.
By default, Alteryx Server is set up as a single node deployment, with the Gallery, Controller, Worker, and Database on a single machine, but can be scaled-up or scaled-out depending on the requirement. The following graphic provides guidelines for determining the total number of nodes that may be needed, based on the total number of simultaneous users and the average amount of time it takes for a workflow to execute.
# of concurrent users |
Workload Size and Average Execution Time |
|||
|
Small (5 sec) |
Medium (30 secs) |
Large (1 mins) |
Xtra Large(2 mins) |
1-20 |
1 node |
1 node |
2 nodes |
3 nodes |
20-40 |
1 node |
2 nodes |
3 nodes |
4 nodes |
40-100 |
2 nodes |
3 nodes |
4 nodes |
5 nodes |
100+ |
3 nodes |
4 nodes |
5 nodes |
6 nodes |
So should you scale up, or scale out? Alteryx Server can be scaled in three different ways:
Availability and scalability requirements can vary depending on the analytic applications you are running and how those analytic application will effect mission critical business if they are unavailable. The following table provides some guidelines for determining when more hardware for Alteryx Server components may need to be added. We also recommend adding a load balancer to increase capacity and distribute web requests when additional Gallery nodes are added.
Scaling a Server “up” by adding extra worker processes to an existing node may actually make the system slower as measured by total throughput. On the other hand, scaling a Server “out” by adding one or more dedicated Worker nodes makes the system faster as measured by total throughput, even if the additional hardware is considered to be somewhat inferior.
As Steve Ahlgren pointed out on, “Measuring and Scaling a Private Server”, the Engine is resource-intensive, especially with large datasets, and can tax a system more extensively than other less-demanding applications. Giving the Engine’s dedicated resources on a machine, meaning full access to CPUs, local hard drives and memory, it can increase total throughput simply because there’s less resource contention between Engine processes. Also, consider adding an SSD to your Alteryx machine. Putting commonly-used datasets onto the SSD (such as the Alteryx Core Data Bundle) as well as using it for the Engine temporary space may provide a boost to your throughput.
If more work needs to be processed and there is not enough simultaneous processing to be able to handle it, the Engine processing capabilities can be increased by configuring multiple machines to act as Workers. Each additional Worker machine must be configured with the unique security token of the Controller to be able to communicate with it. Then, as workflow execution requests begin to queue up, each Worker machine will communicate with the Controller and the Controller will delegate a job for it to process.
If web requests increase because there are many users who need to access or share the workflows and apps published to the Gallery, the Gallery web server can be scaled out by configuring additional machines to act as web server nodes. When setting up multiple web server nodes these nodes would generally be behind a load balancer to help distribute the amount of web requests across all the web servers. Each individual web server will then communicate to the same Controller when there is work to be processed.
If a machine is configured with only the Designer and Scheduler, SQLite can be used as the persistence layer. However, if the Gallery component is enabled on the machine as well, MongoDB must be used. Alteryx includes an embedded version of MongoDB for ease of setup and use but the embedded MongoDB is limited to one server and does not support replica sets. If you deploy your own instance of MongoDB and manage it yourself, you can follow the MongoDB recommendations for scalability and redundancy.
Multiple factors should be considered when making decisions for a Server deployment. Machine specifications, number of users and their needs, types of workflows being built, automation required for jobs, and volume of data to process are all things that will impact the type of hardware and the configuration needed for optimal performance. But with the wide variety of scalability and availability options, you can choose the configuration that works best for your organization. If you are considering upgrading your organization’s analytics solution to Alteryx Server, we recommend the Alteryx Server Quick Start from Alteryx Professional Services to accelerate your transition and ensure scalability and availability to meet both your current and future needs.
If you would like to see how Alteryx has configured our own installation of Alteryx Server, take a look at our Alteryx Server Use Case posted on the documents page at http://downloads.alteryx.com/Documentation/Alteryx_Analytics_Gallery_Server_Use_Case.pdf
There are a number of ways to learn more about Alteryx Server: You can watch a 5 minute demo, you can download a free trial (but be sure to get some help from Sales here), you can visit the product page, or download a whitepaper. Let us know what questions you have and how we can help.
Josh Howard is a technology product veteran covering trends in information management and based in the Denver/Boulder, CO area. He has more than 20 years of experience in developing product and go-to-market strategies across a wide variety of data technologies including business intelligence, analytics, data integration, and database development. He is currently the Sr. Director of Product Management at Alteryx. You can follow Josh on Twitter at @Joshoward
Josh Howard is a technology product veteran covering trends in information management and based in the Denver/Boulder, CO area. He has more than 20 years of experience in developing product and go-to-market strategies across a wide variety of data technologies including business intelligence, analytics, data integration, and database development. He is currently the Sr. Director of Product Management at Alteryx. You can follow Josh on Twitter at @Joshoward
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.