Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Server Knowledge Base

Definitive answers from Server experts.

High-Availability Controller

ZacharyH
Alteryx
Alteryx
Created

Introduction

Alteryx Server provides a fully scalable architecture that allows an organization to scale Alteryx to automate data analytics, tackle bigger projects, process larger datasets and put self-service data analytics into the hands of more decision makers. From scaling Worker nodes to Gallery nodes to the MongoDB persistence layer, Alteryx Server allows organizations to efficiently manage their automated and self-service data analytics needs.

As the number of jobs needing to be executed increases, an organization can scale their Workers. As the number of self-service Gallery users increases, organizations can scale the Gallery. To ensure the availability of your Alteryx Server environment and to protect your workflows and data from disaster, organizations can deploy a user-managed instance of MongoDB and enable advanced features such as Replica Sets and Sharding.

When talking to organizations about scaling, the most common questions we hear are “how can we scale the Controller within our Alteryx Server environment?” or “how can we setup a redundant Controller to prevent an outage within our Alteryx Server environment?”, and that is the focus of this article.

Before we get started, it is important to note that only a single Controller can be active at a given time, meaning it requires an active-passive setup. Thankfully, using Failover Clustering, which is built-in to Microsoft Windows Server, organizations can automate the failover of the Controller. Failover Clustering is responsible for monitoring the AlteryxService and the availability of the server on the network. If the AlteryxService fails or the server goes down, Failover Clustering automates the failover of the Controller functionality to a secondary server.

The following instructions detail how to setup an Alteryx Server environment with redundant Controllers and how to automate the failover of the Controller in the event of an outage. These instructions are intended for scaled-out Alteryx Server environments. When enabling an Alteryx Server environment for automated failover of the Controller, it requires configuration changes to all Gallery and Worker nodes. The configuration changes are detailed in the steps below.

Pre-Requisites

Install the Failover Clustering Feature

The first step in configuring a high-availability Controller is to install Microsoft Failover Clustering. The following steps will need to be completed on each of the Controller failover nodes in the cluster. These instructions apply to Windows Server 2016.

  1. Open Server Manager
  2. From the Manage menu, select Add Roles and Features1-AddRolesFeatures.png
  3. On the “Select Installation Type” screen, select Role-based or feature-based installation” and click next
    2-InstallationType.png
  4. On the “Select destination server” screen, select the server you are currently logged in to from the Server Pool section and click next
    3-DestinationServer.png
  5. On the “Server Roles” screen click next to proceed to the “Features” screen
  6. On the “Features” screen, select “Failover Clustering”
    1. Upon selecting Failover Clustering, click Add Features when the following window appears
      4-AddFeatures.png
  7. Click Next to proceed with adding of the Failover Clustering Feature
    5-SelectFeatures.png
  8. On the “Confirm installation selections” screen, ensure the Failover Clustering Feature is listed and click Install
    6-ConfirmSelections.png
  9. Once the Failover Clustering Feature has been installed, close the “Add Roles and Features Wizard” and add the feature on each of the cluster nodes

Primary Controller Configuration

The first step is to configure your primary controller, if it has not already been configured.

  1. Run the Alteryx System Settings console as Admin
  2. For the Environment >> Setup Type, select Custom >> Enable Controller
  3. From the Controller >> General section, copy the Controller Token for later use
    1. If this is a new server configuration, you will need to complete the setup to generate a Controller Token
      7-PrimaryConfiguration.png
  4. In the Controller >> Persistence section, set the Database Type to User-managed MongoDB
    1. Enter the Host, Database Name, Username and Password in the Controller >> Persistence >> Database section
      8-PrimaryPersistence.png
  5. Complete the remaining Controller configuration steps and click finish to apply the settings and start the AlteryxService with the latest configuration

Failover Controller(s) Configuration

The next step is to configure your failover controller(s). To ensure a seamless failover, each controller in the cluster will need to use the same controller token and Storage Keys for encryption.

  1. For each failover controller, complete the Controller setup per the primary controller setup
  2. After configuring each failover node, you will need to set the Controller Token on each of the Controller failover nodes
    1. Open an Administrative Command Prompt
    2. In the command prompt, navigate to the Alteryx Server installation Directory (default: C:\Program Files\Alteryx\bin
    3. Run the following command, replacing {Controller Token} with the value copied from step 3 of the Primary Controller Configuration: AlteryxService.exe setserversecret={Controller Token}
      9-SetControllerToken.png
    4. Confirm that the Controller Token was set by running the following command: AlteryxService.exe getserversecret
    5. If running, stop the AlteryxService by running the following command: net stop AlteryxService
  3. Finally, you will need to copy the StorageKeysEncrypted value from the primary Controller to each of the nodes in the failover cluster
    1. Open Notepad on the primary Controller
    2. From the File menu, select Open
    3. Open the RuntimeSettings.xml file located at C:\ProgramData\Alteryx
    4. From the Controller section, copy the StorageKeysEncrypted key
      1. Ex. BwIAAACkAACunN7PkZcdMRM2N5pW+NRyqCdBiLuVqWRJELqix6Dg3ZAitUq9BbdlSLS8Ez+me45oiNGd8m81spqMvkNz3f/cyZX8oJVo2itY4JN/RXp4iJJ+obK96UtL8h2k2nq5XZ9GEDANIurhnm5Ww/nKxUw7O0LXtqftXpXLkbD5n/+YAs58iZlKz22dEklMzXQmc5+LBX+5D4O0FAMcD0M+u06vC1zHMmTHSU9G+D6isaVgxQtHMOLP0zTzA+97UDkE0pQOK2IQPnSh58UpHEmQn6K284pLFaKNd89dZuQ43kwo3Gmp+qz3Qp//BkzMMa2Li8eXOmmxTSLpjS+syBiglS5Zu1QFgnxKnQRknex+IGRbCTbva1CIQPqAr/kCK/GNuFnPV4ESJqrs0abbV42vmXdc9Utwy0iQ5ZLO6z1AEAioGj58fgi/rTTr+qqqf4tDk2zyJqyH/fAlxgfMO4z1cZjDHt3vmLNr/U6xyr8WLlH1TiGTBg3c3s9zMlXvd9ZifFfoI62QVEFtH6TCrhTLxsIphbj/VOtLtKaYT2SMtFz/XkxA8Ns5s4Ex5gv6jJJXihVXFxXaeQZJdQBAbVM607LTAMWN8r3Vdr5GYUBCL7i8wwYVx/4GpwU7qEMWgG0sFuFSpw9+54b2NJk7avBxIU5EVaFsbBfWRULzazwjVaA5e93NZ6Q1qm/FiCfAMSV+DUubWManxJbcttn9vEz7upQCO7DnZoxdLr4oYLm+w5MOf5QUX3l/zqIiUcbDQHa5q/gHOQwwCYvnOUMkEEHZ5kba
    5. On each of the Failover nodes in the cluster:
      1. Open Notepad as an Administrator
      2. From the File menu, select Open
      3. Open the RuntimeSettings.xml file located at C:\ProgramData\Alteryx
      4. Replace the StorageKeysEncrypted key with the value copied from the primary Controller
      5. Save the RuntimeSettings.xml file
      6. Close Notepad

Create a Failover Cluster

Once you have added the Failover Clustering Feature to each node in the cluster, the next step is to create a cluster. These steps can be completed from any of the servers that the Failover Clustering Feature has been enabled on.

  1. Open Server Manager
  2. From the Tools menu, select “Failover Cluster Manager”
    10-FOClusterMgr.png
  3. Within the Actions pane of the “Failover Cluster Manager console”, select “Create Cluster”
    11-CreateNewCluster.png
  4. Within the “Create Cluster Wizard” click next on the “Before You Begin” screen
  5. On the “Select Servers” screen, enter the server name of each node that will be added to the cluster and click the Add button. Once you have added each of the servers that will be included in the cluster, verify the servers are listed in the Selected servers window and click next to proceed
    12-SelectServers.png
  6. On the “Validation Warning” screen, select “Yes. When I click Next, run configuration validation tests, and then return to the process of creating the cluster” and click Next
    13-ValidationWarning.png
  7. After clicking next on the “Validation Warning” screen, the the “Validate a Configuration Wizard” will be launched.
    14-ValidationBegin.png
  8. On the “Testing Options” screen, select “Run all tests (recommended)" and click Next
    15-ValidationOptions.png
  9. On the “Confirmation” screen, validate that all nodes being added to the Cluster are listed in the “Servers to Test” section and click Next to run the validation tests
    16-ValidationConfirmation.png
  10. Once the validation process is complete, review the Summary to ensure there are no errors that need to be addressed and click Finish to return to the cluster creation process
    1. Optionally, you can click “View Report” to review the detailed validation report
  11. On the “Access Point for Administering the Cluster” screen of the “Create Cluster Wizard”, enter a Cluster Name. The Cluster Name will be added to DNS within the Active Directory domain and will be used for Administering the cluster and any roles owned by the Cluster. Once you have entered a Cluster Name, click next to proceed to the confirmation screen.
    17-AccessPoint.png
  12. On the “Confirmation” screen, verify the cluster name and that each node being added to the cluster is listed in the Node section and then click next to proceed
    18-CreateConfirmation.png
  13. Upon clicking next on step 12, the new Cluster will be configured and added to DNS
    19-CreatingCluster.png
  14. Once the cluster has been configured, you should receive a Summary screen stating “You have succesfully completed the Create Cluster Wizard.” Click Finish to close the “Create Cluster Wizard”
    20-CreatingClusterComplete.png

Add a Cluster Role

Now that we have created a cluster, we need to add a Cluster Role. These steps can be completed from any of the servers that the Failover Clustering Feature has been enabled on.

  1. Open Server Manager
  2. From the Tools menu, select “Failover Cluster Manager”
  3. Within the “Failover Cluster Manager” console, expand the newly created Cluster, highlight Roles on the left and from within the Actions menu on the right, click Configure Role
    21-AddClusterRole.png
  4. In the “High Availability Wizard”, click Next on the “Before you Begin” screen
  5. On the “Select Role” screen, highlight the “Generic Service” Role and click Next
    22-SelectRole.png
  6. On the “Select Service” screen, select Alteryx Service and click next to proceed
    23-SelectServices.png
  7. On the “Client Access Point” screen, enter a DNS name that will be used for accessing the cluster role. This is the DNS name that will be used when configuring Gallery and Worker nodes to access the High Availability Controller cluster.
    24-ClientAccessPoint.png
  8. Click Next on the “Select Storage” and “Replicate Registry Settings” screens
  9. On the “Confirmation” screen, verify the settings and click Next
    25-RoleConfirmation.png
  10. Upon clicking next on step 9, the the Cluster Role will be created and added to DNS. Once the High Availability role has been created, you should receive a Summary screen stating “High availability was successfully configured for the role.” Click Finish to close the “High Availability Wizard”
    26-RoleSummary.png

Microsoft Failover Clustering will now manage the state of the AlteryxService.exe on each of the nodes in the cluster. The AlteryxService.exe will be started on the “Owner” (active) node and the failover nodes will be in a stopped state. In the event of a failure on the “Owner” node, Microsoft Failover Clustering will start the AlteryxService.exe on one of the failover nodes and automatically direct traffic to the active Alteryx Controller.

Gallery Node Configuration

Now that we have our High Availability Controller cluster running, the next step is to complete the setup of Alteryx Gallery and Worker nodes. To complete the setup, configure the Gallery and Worker nodes as you normally would in distributed Alteryx Server environment. When you reach the Controller configuration, proceed as follows:

  1. On the “Remote Controller” screen, enter the DNS host name that was created in step 7 of the Add a Cluster Role section and theController Token obtained in step 3 of thePrimary Controller Configurationsection of these instructions
    28-GalleryRemoteController.png
  2. Click the Test button to confirm compatibility
    29-GalleryRemoteControllerTestSuccess.png
    1. If you do not receive a Success notification:
      1. Confirm that all nodes in the Alteryx Server environment (Gallery, Controller, and Worker nodes) are running the same version of Alteryx Server.
      2. Confirm there are no firewalls blocking TCP port 80 on the Controller nodes
  • Complete the remainder of the Alteryx System Settings, as required for each node in the Alteryx Server environment and click Finish on the “Finalize Your Configuration” screen to apply the settings and start the AlteryxService using the newly applied settings.

Worker Node Configuration

Now that we have our High Availability Controller cluster running, the next step is to complete the setup of Alteryx Gallery and Worker nodes. To complete the setup, configure the Gallery and Worker nodes as you normally would in distributed Alteryx Server environment. When you reach the Controller configuration, proceed as follows:

  1. On the “Remote Controller” screen, enter the DNS host name that was created in step 7 of the Add a Cluster Role section and the Controller Token obtained in step 3 of the Primary Controller Configuration section of these instructions
    31-WorkerRemoteController.png
  2. Click the Test button to confirm compatibility
    32-WorkerRemoteControllerTestSuccess.png
    1. If you do not receive a Success notification:
      1. Confirm that all nodes in the Alteryx Server environment (Gallery, Controller, and Worker nodes) are running the same version of Alteryx Server.
      2. Confirm there are no firewalls blocking TCP port 80 on the Controller nodes
  • Complete the remainder of the Alteryx System Settings, as required for each node in the Alteryx Server environment and click Finish on the “Finalize Your Configuration” screen to apply the settings and start the AlteryxService using the newly applied settings.

Automated Failover Testing

Upon completion of configuring the Controller for High-Availability and automated failover, it is highly recommended that you perform testing to validate the configuration and ensure the automated failover succeeds. There are several methods you can deploy to test the automated failover.

  1. Manual Failover – These steps can be performed from any server within the Failover Cluster
    1. Open Server Manager
    2. From the Tools menu, select “Failover Cluster Manager”
    3. Within the “Failover Cluster Manager” console, expand the newly created Cluster.
      • 1) If the cluster is not displayed, from within the Actions pane, click “Connect to Cluster…” and follow the on-screen prompts to connect to the newly created cluster
    4. Within the Roles section of the newly created cluster, right click the role and select Move >> Select Node…
      33-RoleTesting.png
    5. Select one of the available Cluster Nodes. The “Move Clustered Role” window will only display the available destination nodes, the current “Owner” node will not be displayed.
      34-SelectNode.png
    6. Click the OK button to initiate the failover to the selected Cluster Node
    7. Once the “Owner” node has changed and the status is “Running”, proceed with verifying the failover of the Controller. For details on what to verify and testing, refer to the Failover Verification and Testing section
      35-NodeTransferred.png
  2. Power outage simulation
    1. Virtual Machines
      • 1) Option 1 – System shutdown
        1. Login to Remote Desktop of the current “Owner” node
        2. Open the Windows start menu
        3. Select “Shut down” to power off the server
      • 2) Option 2 - Power off the Guest (Requires access to the Virtual Machine Hypervisor)
        1. Open the Virtual Machine Hypervisor
        2. Within the Hypervisor, locate and select the owner node
        3. With the Owner node selected, within the Hypervisor, power off the Virtual Machine
  3. Physical Server (Requires access to the physical server)
    • 1) Power off or unplug the server to force a failover
  4. Once the “Owner” node has changed and the status is “Running”, proceed with verifying the failover of the Controller. For details on what to verify and testing, refer to the Failover Verification and Testing section

Failover Verification and Testing

When verifying and testing the automated failover, it is important to verify and test all functionality utilized within your Alteryx Server environment. This would include, but not be limited to:

  1. Verify the AlteryxService.exe has started on the new Owner node (manual failover)
    1. Open Task Manager
    2. Switch to the Services tab
    3. Verify the AlteryxService.exe status is Running
  2. Verify the AlteryxService.exe has stopped on the old Owner node (manual failover)
    1. Open Task Manager
    2. Switch to the Services tab
    3. Verify the AlteryxService.exe status is Stopped
  3. Test the execution of Alteryx Workflows/Analytic Apps
    1. On-demand Workflow/Analytic App executions (Gallery)
    2. Scheduled Workflow/Analytic App executions
    3. Workflows/Analytics App utilizing Run As credentials and/or requiring users to specify their credentials
    4. Workflows utilizing Gallery Database Connections (Gallery >> Admin >> Database Connections)
  4. Test the ability to publish Alteryx Workflows/Analytic Apps

Known Issues/Concerns

  1. In-flight jobs – In the event you have jobs running when the controller failover occurs,
Comments
Kong
7 - Meteor

Great for all these details.

 

Would there be any ports requirements on the Windows failover cluster?

davisbrs
5 - Atom

Hi @ZacharyH. I am here to ask the same question as above. Are there any port requirements on the Windows failover cluster?

mtornga
6 - Meteoroid

mtornga_0-1587651583491.png

We found it important to set the Startup Type of the service to Manual on all Controller nodes. We had a situation where the controllers rebooted and more than one started running Alteryx Service due to the automatic start setting. Autostart needs to be disabled so that the Role can manage what is running. 

 

Having more than one controller online will cause jobs to queue infinitely. 

mse139
8 - Asteroid

Looks like the known issues section is cut off and we are wondering what we are to expect for in-progress jobs.  We ran some tests and it seems any job running during the failover gets re-queued.

SophiaF
Alteryx
Alteryx

The missing end text E9HKMSR

OllieClarke
15 - Aurora
15 - Aurora

Hi @ZacharyH thanks for the blog! 

 

Is having Windows Active Directory a pre-requisite for this process? 

I got stuck at the "Create a failover cluster" step. I think it's because we don't have AD set up, but wanted to confirm this...