Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
JoshH
Alteryx Alumni (Retired)

How should I scale Alteryx Server? This is probably one of the most frequently asked questions we get regarding the Alteryx Server. There are some great blogs on the Alteryx Community that discuss different facets on the topic of scalability – one of my favorites is from Steve Ahlgren on the engine team titled, “Measuring and Scaling a Private Server”. It is a great blog that discusses the performance and throughput of the Alteryx Server. But if you are new to Alteryx Server, you might not be aware of the flexibility the Server has and all the options available when it comes to scalability, and that’s what I’ll cover in this post.

 

If you are new to Alteryx Server, it was designed to be a secure and scalable platform for sharing analytics and to empower everyone in your organization to make data-driven decisions while ensuring analytic governance. We’ve got a great 5 minute demo if you want to learn more. But before we jump into scalability, let me provide some additional background…

 

One of the great things about Alteryx Server, is that it is easy to set up and configure on-premises or in the cloud, and it can be set up and deployed within the same day. And once you deploy it, you can set up the scheduling and automation, use our Alteryx APIs to extend processes into other workflows, share applications and much more. It’s truly the fastest and easiest way to deploy analytics across your organization.

 

The Alteryx Server includes three main components: A scheduler for workflow execution, a Gallery for sharing and collaboration and a service layer to process the workflow. All of these components are typically installed on a single machine. But, as workloads increase, you can scale Alteryx Server across multiple machines to accommodate for higher levels of usage as well as for availability.

 

Alteryx ServerAlteryx Server

 

Scalability and Availability

 

Both scalability and availability should be considered for flexibility in the future. The amount of hardware and software you need now might change in the future as demand increases. After Alteryx is deployed, you can use software-monitoring tools to alert you when certain components of your system are near or at capacity and when scaling the solution is needed. By implementing recommended IT practices, you can increase the availability of key components in the Alteryx Server and minimize both planned downtime maintenance, service pack installations, as well as unplanned downtime such as downtime caused by a hardware failure.

 

With availability, the goal is to compare the costs of your current IT environment and the actual cost of downtime vs. the cost of a high availability solution. Once you have determined this tolerance, your IT managers can use these numbers to make a decision and how high of an availability solution your organization should have.

 

By default, Alteryx Server is set up as a single node deployment, with the Gallery, Controller, Worker, and Database on a single machine, but can be scaled-up or scaled-out depending on the requirement. The following graphic provides guidelines for determining the total number of nodes that may be needed, based on the total number of simultaneous users and the average amount of time it takes for a workflow to execute.

 

# of concurrent users

Workload Size and Average Execution Time

 

Small (5 sec)

Medium (30 secs)

Large (1 mins)

Xtra Large(2 mins)

1-20

1 node

1 node

2 nodes

3 nodes

20-40

1 node

2 nodes

3 nodes

4 nodes

40-100

2 nodes

3 nodes

4 nodes

5 nodes

100+

3 nodes

4 nodes

5 nodes

6 nodes

 

Should I Scale Up or Scale Out?

 

So should you scale up, or scale out? Alteryx Server can be scaled in three different ways:

 

  • Scaling the Worker node for additional processing power - Scale out the Workers by creating multiple Worker nodes. This will increase the total number of workflows that can be processed at any given time.
  • Scaling the Gallery node for additional web users - Create multiple Gallery nodes and place them behind a load balancer. This can be useful if you have lots of Gallery users.
  • Scaling the Database node for availability and redundancy - Scale out the persistent databases to create multiple Database nodes. This is useful for ensuring backups and can improve overall system performance.

 

Availability and scalability requirements can vary depending on the analytic applications you are running and how those analytic application will effect mission critical business if they are unavailable. The following table provides some guidelines for determining when more hardware for Alteryx Server components may need to be added. We also recommend adding a load balancer to increase capacity and distribute web requests when additional Gallery nodes are added.

 

Server scalabilityServer scalability

 

Scaling a Server “up” by adding extra worker processes to an existing node may actually make the system slower as measured by total throughput.  On the other hand, scaling a Server “out” by adding one or more dedicated Worker nodes makes the system faster as measured by total throughput, even if the additional hardware is considered to be somewhat inferior.

 

As Steve Ahlgren pointed out on, “Measuring and Scaling a Private Server”, the Engine is resource-intensive, especially with large datasets, and can tax a system more extensively than other less-demanding applications. Giving the Engine’s dedicated resources on a machine, meaning full access to CPUs, local hard drives and memory, it can increase total throughput simply because there’s less resource contention between Engine processes. Also, consider adding an SSD to your Alteryx machine. Putting commonly-used datasets onto the SSD (such as the Alteryx Core Data Bundle) as well as using it for the Engine temporary space may provide a boost to your throughput.

 

Scaling for Additional Processing Power

 

If more work needs to be processed and there is not enough simultaneous processing to be able to handle it, the Engine processing capabilities can be increased by configuring multiple machines to act as Workers. Each additional Worker machine must be configured with the unique security token of the Controller to be able to communicate with it. Then, as workflow execution requests begin to queue up, each Worker machine will communicate with the Controller and the Controller will delegate a job for it to process.

 

 Scaling for Additional Processing Power Scaling for Additional Processing Power

 

Scaling for Additional Web Users


If web requests increase because there are many users who need to access or share the workflows and apps published to the Gallery, the Gallery web server can be scaled out by configuring additional machines to act as web server nodes. When setting up multiple web server nodes these nodes would generally be behind a load balancer to help distribute the amount of web requests across all the web servers. Each individual web server will then communicate to the same Controller when there is work to be processed.

 

Scaling for Additional Web UsersScaling for Additional Web Users

 

Scaling for Availability and Redundancy


If a machine is configured with only the Designer and Scheduler, SQLite can be used as the persistence layer. However, if the Gallery component is enabled on the machine as well, MongoDB must be used. Alteryx includes an embedded version of MongoDB for ease of setup and use but the embedded MongoDB is limited to one server and does not support replica sets. If you deploy your own instance of MongoDB and manage it yourself, you can follow the MongoDB recommendations for scalability and redundancy. 

 

Scaling for Availability and RedundancyScaling for Availability and Redundancy

In Summary

Multiple factors should be considered when making decisions for a Server deployment. Machine specifications, number of users and their needs, types of workflows being built, automation required for jobs, and volume of data to process are all things that will impact the type of hardware and the configuration needed for optimal performance. But with the wide variety of scalability and availability options, you can choose the configuration that works best for your organization. If you are considering upgrading your organization’s analytics solution to Alteryx Server, we recommend the Alteryx Server Quick Start from Alteryx Professional Services to accelerate your transition and ensure scalability and availability to meet both your current and future needs. 

 

If you would like to see how Alteryx has configured our own installation of Alteryx Server, take a look at our Alteryx Server Use Case posted on the documents page at http://downloads.alteryx.com/Documentation/Alteryx_Analytics_Gallery_Server_Use_Case.pdf

 

Want to Learn More About Alteryx Server?

There are a number of ways to learn more about Alteryx Server: You can watch a 5 minute demo, you can download a free trial (but be sure to get some help from Sales here), you can visit the product page, or download a whitepaper. Let us know what questions you have and how we can help.

Josh Howard
Sr. Director, Product Management

Josh Howard is a technology product veteran covering trends in information management and based in the Denver/Boulder, CO area. He has more than 20 years of experience in developing product and go-to-market strategies across a wide variety of data technologies including business intelligence, analytics, data integration, and database development. He is currently the Sr. Director of Product Management at Alteryx. You can follow Josh on Twitter at @Joshoward

Josh Howard is a technology product veteran covering trends in information management and based in the Denver/Boulder, CO area. He has more than 20 years of experience in developing product and go-to-market strategies across a wide variety of data technologies including business intelligence, analytics, data integration, and database development. He is currently the Sr. Director of Product Management at Alteryx. You can follow Josh on Twitter at @Joshoward

Comments
Cristian
9 - Comet

You should explore the option to pack worker and scheduler inside new windows -2016 server containers. Could be a good path to follow in order to achieve high scalability and keep processors busy.

NaveenNaidu
5 - Atom

Does Alteryx  provide/recommend any load balancer for gallery scaling out?

 

rjoseph
5 - Atom

What is the recommendation for scaling the Controller (assuming controller is running on a separate node)? If Gallery is scaled for additional web users, it needs Controller to fetch the status or submit jobs. If workers are scaled for adding more parallelism, these workers also need the controller. Controller can be a single point of failure. What is the recommended approach to scale Controller?