To ensure that prediction requests are responsive and that an Alteryx Promote instance can scale with user and model demands, the size of the supporting infrastructure should be carefully considered. This article will review the main factors that influence system performance and provide guidance on sizing for an initial deployment. Details should always be reviewed with your Alteryx representative to understand how these recommendations might be customized to best fit your use case and requirements.
For helpful background information, we recommend reading the article An Overview of Promote's Architecture.
Promote has a minimum system requirement of 3 machines, each with 4 cores and 16GB of RAM:
This configuration is probably suitable for most development environments. However, for a production environment, there are several factors that should be understood to ensure the environment is sized properly. The following sections will introduce these factors, as well as how they impact resource consumption.
Number of Models
Each model deployed to Promote creates a new Docker service with two containers by default. These containers must always be ready to receive prediction requests, and thus are consuming system resources (CPU, RAM, Disk). The model size and complexity also contribute to the amount of system resources that are consumed.
The replication factor setting determines the number of Docker tasks & containers to deploy to support a given model. The default value is two, which means each model will have two containers running to service incoming prediction requests. This provides redundancy in the case of failure and allows for two prediction requests to be handled concurrently. For a development environment, this number could be reduced to 1. For high demand production environments, the number of replicas can be increased to handle the workload with fast response times.
The default replication factor is 2.
Promote models are deployed as Docker containers which are running instances of Docker images. These containers utilize memory, so the size of the model (the amount of memory it consumes) directly affects the number of models that can be deployed onto a machine. The goal is to maximize the number of models that can be deployed on a machine while still performing well and not exhausting the memory.
A useful command for understanding the memory size of models deployed in a Promote environment is below.
# docker ps -q | xargs docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
0ca3a80f0f43 joeuser-irene-2.1.wb3brhtaz6zxe1l40a5gtmcvp 0.00% 102.9MiB / 31.37GiB 0.32% 123MB / 115MB 9.1MB / 120kB 29
71813bc3a453 sallyuser-CampaignRStudioNew-1.1.zkpsdf6cu9as4n19zhmxlzqe8 6.88% 228.3MiB / 31.37GiB 0.71% 45MB / 40.8MB 0B / 76.8kB 33
In this example, Sally's model is using 228MB while Joe's is only using 102MB. This amount will fluctuate a bit as prediction requests come and go, but overall this gives a good idea of the model's memory requirements.
Frequency of Prediction Requests
Prediction requests to models require CPU time to execute. The more frequently prediction requests come in, the higher the resulting CPU utilization will be.
Complexity of Models
Not all models are created equal. A simple logistic regression doesn't require a large amount of resources to derive a prediction and typically responds in a few milliseconds. However, a much more complicated model such as Gradient Boosting could crunch away for several seconds consuming CPU cycles before responding.
Prediction Logging stores information for every prediction request which can be viewed in the Model Overview -> History tab. This includes the date and time of the request, the inputs to the model prediction, and the predicted output from the model. This data is stored for up to14 days, with a maximum storage size of 50GB.
Prediction logging can be set for Dev & Staging models, or for Production models.
Prediction Logging can be useful for testing and validation of models in a Development / Staging environment, or for auditing information in a Production environment. There is however a performance cost for enable prediction logging, as all the prediction information is logged and sent to Elasticsearch to be indexed for quick retrieval later. The overhead of this setting will certainly vary based on many of the factors mentioned above, but as an example we've seen CPU utilization double when enabling Prediction Logging in high volume environments.
Every organization's use cases, models, workloads, and data sizes are different. The below recommendations are suggested starting points. These recommendations are based on:
- Default Replication Factor = 2
- Small to Medium model sizes. (100MB - 200MB memory requirements)
- Simple model complexity. (predictions only take a few ms)
- Infrequent Prediction Requests.
- Prediction Logging ON
We've categorized the environments as Small, Medium, or Large based on the total number of models (which includes Dev, Staging, and Production).
| ||Total Number of Models||Minimum Recommended|
|3-Machine Config||4-Machine Config||6-Machine Config|
0 - 19
|12||48 GB||4 cores / 16 GB||N/A||N/A|
|Medium||20 - 39||24||96 GB||8 cores / 32 GB||6 cores / 24 GB||4 cores / 16 GB|
|Large||40 +||36||144 GB||12 cores / 48 GB||9 cores / 36 GB||6 cores / 24 cores|
You'll notice a 4GB per core ratio. In most cases, this works well. If model sizes are much larger than what is shown here then we recommend increasing that ratio, perhaps to 8GB per core. You'll also notice for the Medium and Large environments, there are options for 3 large machines or multiple smaller machines. Let's dive into the reasons to consider each.
Scaling Vertically (adding cores) vs Horizontally (adding machines)
Scaling Promote vertically by increasing the number of processing cores on each machine will provide your predictive models with additional CPU power when processing incoming requests without additional machines to manage.
Scaling Promote horizontally by adding additional machines allows you more redundancy and protection in the event a machine does fail, and also allows you to configure a higher replication factor for the models. This will provide greater concurrency to support models with high volume prediction request rates.
This article has shown some of the key factors to consider when designing a new Alteryx Promote environment. Understanding these factors will help design an environment that can support fast response times to user prediction requests and scale as workload demands increase. Details should always be reviewed with your Alteryx representative to understand how these recommendations might be customized to best fit your use case and requirements.