Learn more about the Alteryx Maveryx Universe recently announced at Inspire 2023!

Alteryx Promote Knowledge Base

Definitive answers from Promote experts.

Promote Server Sizing Guidance

DavidHa
Alteryx
Alteryx
Created

Introduction

To ensure that prediction requests are responsive and that an Alteryx Promote instance can scale with user and model demands, the size of the supporting infrastructure should be carefully considered. This article will review the main factors that influence system performance and provide guidance on sizing for an initial deployment. Details should always be reviewed with your Alteryx representative to understand how these recommendations might be customized to best fit your use case and requirements.

 

 

For helpful background information, we recommend reading the article An Overview of Promote's Architecture.

 

 

Minimum Requirements

Promote has a minimum system requirement of 3 machines, each with 4 cores and 16GB of RAM:
 

Promote_min_requirements.PNG

 

This configuration is probably suitable for most development environments. However, for a production environment, there are several factors that should be understood to ensure the environment is sized properly. The following sections will introduce these factors, as well as how they impact resource consumption.

 

 

Number of Models

Each model deployed to Promote creates a new Docker service with two containers by default. These containers must always be ready to receive prediction requests, and thus are consuming system resources (CPU, RAM, Disk). The model size and complexity also contribute to the amount of system resources that are consumed.

 

 

Replication Factor

The replication factor setting determines the number of Docker tasks & containers to deploy to support a given model. The default value is two, which means each model will have two containers running to service incoming prediction requests. This provides redundancy in the case of failure and allows for two prediction requests to be handled concurrently. For a development environment, this number could be reduced to 1. For high demand production environments, the number of replicas can be increased to handle the workload with fast response times.

 

The default replication factor is 2.
The default replication factor is 2.

 

 

Model Size

Promote models are deployed as Docker containers which are running instances of Docker images. These containers utilize memory, so the size of the model (the amount of memory it consumes) directly affects the number of models that can be deployed onto a machine. The goal is to maximize the number of models that can be deployed on a machine while still performing well and not exhausting the memory.

 

 

A useful command for understanding the memory size of models deployed in a Promote environment is below.

 

# docker ps -q | xargs docker stats --no-stream
CONTAINER ID   NAME                                                         CPU %    MEM USAGE / LIMIT      MEM %    NET I/O          BLOCK I/O        PIDS
0ca3a80f0f43   joeuser-irene-2.1.wb3brhtaz6zxe1l40a5gtmcvp                  0.00%    102.9MiB / 31.37GiB    0.32%    123MB / 115MB    9.1MB / 120kB    29
71813bc3a453   sallyuser-CampaignRStudioNew-1.1.zkpsdf6cu9as4n19zhmxlzqe8   6.88%    228.3MiB / 31.37GiB    0.71%    45MB / 40.8MB    0B / 76.8kB      33

 

In this example, Sally's model is using 228MB while Joe's is only using 102MB. This amount will fluctuate a bit as prediction requests come and go, but overall this gives a good idea of the model's memory requirements.

 

 

Frequency of Prediction Requests

Prediction requests to models require CPU time to execute. The more frequently prediction requests come in, the higher the resulting CPU utilization will be.
 

Complexity of Models

Not all models are created equal. A simple logistic regression doesn't require a large amount of resources to derive a prediction and typically responds in a few milliseconds. However, a much more complicated model such as Gradient Boosting could crunch away for several seconds consuming CPU cycles before responding.

 

 

Prediction Logging

Prediction Logging stores information for every prediction request which can be viewed in the Model Overview -> History tab. This includes the date and time of the request, the inputs to the model prediction, and the predicted output from the model. This data is stored for up to14 days, with a maximum storage size of 50GB.

 

 

Prediction logging can be set for Dev & Staging models, or for Production models.
Prediction logging can be set for Dev & Staging models, or for Production models.

 

Prediction Logging can be useful for testing and validation of models in a Development / Staging environment, or for auditing information in a Production environment. There is however a performance cost for enable prediction logging, as all the prediction information is logged and sent to Elasticsearch to be indexed for quick retrieval later. The overhead of this setting will certainly vary based on many of the factors mentioned above, but as an example we've seen CPU utilization double when enabling Prediction Logging in high volume environments.

 

Recommendations

Every organization's use cases, models, workloads, and data sizes are different. The below recommendations are suggested starting points. These recommendations are based on:

 

  • Default Replication Factor = 2
  • Small to Medium model sizes. (100MB - 200MB memory requirements)
  • Simple model complexity. (predictions only take a few ms)
  • Infrequent Prediction Requests.
  • Prediction Logging ON

 

We've categorized the environments as Small, Medium, or Large based on the total number of models (which includes Dev, Staging, and Production).
 

 Total Number of ModelsMinimum Recommended
Cores
Minimum Recommended
RAM
3-Machine Config4-Machine Config6-Machine Config
Small

0 - 19

1248 GB4 cores / 16 GBN/AN/A
Medium20 - 392496 GB8 cores / 32 GB6 cores / 24 GB4 cores / 16 GB
Large40 +36144 GB12 cores / 48 GB9 cores / 36 GB6 cores / 24 cores

 

 

You'll notice a 4GB per core ratio. In most cases, this works well. If model sizes are much larger than what is shown here then we recommend increasing that ratio, perhaps to 8GB per core. You'll also notice for the Medium and Large environments, there are options for 3 large machines or multiple smaller machines. Let's dive into the reasons to consider each.

 

 

Scaling Vertically (adding cores) vs Horizontally (adding machines)

Scaling Promote vertically by increasing the number of processing cores on each machine will provide your predictive models with additional CPU power when processing incoming requests without additional machines to manage.

 

 

Scaling Promote horizontally by adding additional machines allows you more redundancy and protection in the event a machine does fail, and also allows you to configure a higher replication factor for the models. This will provide greater concurrency to support models with high volume prediction request rates.

 

 

Conclusion

This article has shown some of the key factors to consider when designing a new Alteryx Promote environment. Understanding these factors will help design an environment that can support fast response times to user prediction requests and scale as workload demands increase. Details should always be reviewed with your Alteryx representative to understand how these recommendations might be customized to best fit your use case and requirements.

 

 

Other Resources