Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
MichaelSu
Alteryx Alumni (Retired)

_____$$$$_________$$$$_____
___$$$$$$$$_____$$$$$$$$___
_$$$$$$$$$$$$_$$$$$$$$$$$$_
$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$
_$$$$$$$$$$$$$$$$$$$$$$$$$_
__$$$$$$$$$$$$$$$$$$$$$$$__
____$$$$$$$$$$$$$$$$$$$____
_______$$$$$$$$$$$$$_______
__________$$$$$$$__________
____________$$$____________
_____________$_____________

 

1. Lineage Tracking

 

Each individual Alteryx Workflow has underlying XML code. As you drag and drop individual building blocks onto the canvas, Alteryx is creating XML on the backend. The resulting XML exposes each tool under a Node designation. An end user is able to parse out each tool and follow what fields are being processed in order to achieve the lineage you are going after.  A good example of this, that is publicly available, is the Auto Documentation App. When running against a workflow, it captures anything and everything (barring encrypted passwords and the like) from data sources involved, connection types, specific tools that are used, how/when they are used, any selection in each tool, annotations, and more.

 

Note: You can view the XML as you are building your workflow by navigating to Options â†’ User Settings â†’ Edit User Settings â†’ Advanced â†’ ‘Display XML in Properties Window’ and then view the XML in the Configuration pane. You can also convert a .yxmd workflow to .txt and take a look at the text. Pro tip: some users will do this to change the Designer version they are working on.

 

2. Data Governance

 

With respect to data accessibility, Alteryx adheres to all existing database administration rules and procedures when connecting to your data sources. Anytime a connection is setup within an Alteryx workflow, credentials are used whether that be by pass-thru authentication or a named user account. Access to data assets is governed by the database or network permissions that are setup by a company’s IT department at the local level drive.

 

With respect to data management, the Alteryx Engine is connecting to the data sources when it pulls in data. The Engine then performs all the functions described by the workflow in-memory. Since Alteryx does not require data stored in an additional data format, you are able to eliminate ‘spreadmarts’ which helps ensure reliability and consistency across your data ecosystem.

 

3. Audit & Logging

 

Alteryx Server provides several types of logs (Service, Gallery, Engine, UI) and allows the Admin to specify the level of detail for logging. Logs let you trace errors, warnings, information and debugging messages. The logs are saved on the Server in the following locations:

 

  • Gallery logs are stored in C:\ProgramData\Alteryx\Gallery\Logs and will have a name like: alteryx-2017-09-13.csv
  • Server / service logs are stored in C:\ProgramData\Alteryx\Service and will have a name like: AlteryxServiceLog_2017-06-04_00-46-07.log, with the latest log being named: AlteryxServiceLog.log
  • Engine logs are disabled by default but can be enabled in the server configuration tool.
  • System level logs are available in the Windows event viewer.

 

Audit logs are also accessible via API endpoints. Server tracks changes to system entities including AppInfo, Collection, Credential, Subscription & User. Any updates to these entities will trigger the creation of an AuditEvent record. You can return these records via public Admin API endpoint.

 

reboot.jpg

 

4. Flexible Deployment & Scalability

 

One of the great things about Alteryx Server is that it is easy to set up and configure on-premises or in the cloud (AWS, Azure, GCP, etc.). It can also be setup / deployed within the same day! Once Server is deployed, Alteryx makes it simple and flexible to scale to meet increasing demand. Our Server architects love that that you do not have to shut down the entire Server in order to add  Worker nodes…you can simply install the additional node and point it to the unique security token from the Controller Token in order to stand up and communicate.

 

5. Comprehensive & Customizable Usage Reporting

 

Administers require monitoring and reporting capabilities whether it is to track usage, adhere to a data governance program, ensure best practices / optimal performance, or have greater overall visibility into their Alteryx Server deployments. Alteryx makes it easy to either create your own workflow / dashboard via the MongoDB Input Tool or leverage the pre-built Server Usage Report. With the ‘build it yourself’ option, users oftentimes find themselves customizing the metrics they would like to report on and schedule these workflows to run in cadence. The pre-built Usage Report contains a packaged workflow that runs in Alteryx Designer and outputs to Tableau to present four basic dashboards

 

  1. User Access: Studios, Users, Types of Users
  2. Job Analysis: # of Apps & Workflows Run, Runtime Details, Queue Analysis
  3. Content: Studio & Collection Content, Authors, Download Metrics
  4. Scheduling: Scheduling Times, Day of Week, Time of Day, Peak / Off-Peak Hours

 

6. Segregation of Duties across the Platform

 

Within the Gallery you can manage users and provision various levels of data access based on the user role. User roles determine the users’ level of access to Gallery users and assets. Gallery users can have one of the following roles:

 

  • Curator: Gallery Admin who can access the Admin interface to perform administrative tasks
  • Artisan: Artisans can publish, run, and share workflows in their Private Studio & shared Collections
  • Member: Members can run workflows that are shared with them via Collections
  • Viewer: Viewers can run public workflows on the homepage and in districts
  • No Access: Blocks access to all Gallery Access. Typically used in Galleries leveraging Windows Authentication or SAML

 

Collections also have a granular level of permissions for finer grained controls.

 

collections.png

 

7. Advanced Access & Security Control

 

Access is configured and managed by an Admin role. There can be multiple admins on the Alteryx Server. An Admin may associate Windows AD groups with access to execute workflows. They can change the default roles for new user accounts and may allow user account creation by configuration. Server supports centralized named users or user group permissions of data connections strings defined and shared from the Server Admin panel.

 

8. Gallery Authentication Options

 

Built-In:

Users are created and managed within Alteryx Server

No dependencies or integration needed with any other system

Users are managed manually through a Server’s web interface

Users can be allowed to sign-up and have a pre-defined permission level

 

Active Directory Integration (including Kerberos):

Authenticate using AD credentials

Users can be automatically added from AD or prevented from automatic access

In an AD Forest setup, full bi-directional trust is required

 

SAML Integration:

Supports SAML v2.0 integration for Single Sign-On

PingOne, Okta, and Azure AD have been validated and other SAML providers may work as well

Users can be automatically added from Identity Provider (allowed to sign-in) or prevented from automatic access

User authentication is handled by Identity Provider and user access is managed there

 

natural_selection.jpg

 

9. Sandbox & Production Environments

 

A standard deployment of Alteryx Server has a Sandbox and Production environments. Most users work against the sandbox and only production workflows are promoted to production. With Sandbox, you can test workflows before productionizing and also test upgrades so that content is not mistakenly broken with new Server releases. In turn, IT is able to easily implement the process around publishing to Production.

 

From a best practice perspective, we see two general areas of usage for lower environments (sandboxes).

 

  1. Software Development Life Cycle process control
    1. A separate environment used as the first destination for publishing workflows where iterative development testing, data connection testing, workflow performance / load monitoring, peer review, and user acceptance testing occurs
    2. Many groups have analogous lower environments for data sources that are only available to the sandbox server, with production data connections only available on the production server.
    3. Sometimes compliance protocols dictate that these efforts happen on more than one sandbox for complete process separation.
    4. Even on a single sandbox, from a technical perspective we are protecting the production environment from ever experiencing performance degradations due to poor workflow design (cartesian products, malformed macros that loop endlessly, inefficient data queries, etc.)
  2. Upgrades & Patching
    1. Allows for regression testing of workflows in parallel with Production
    2. Covers both scenarios of Alteryx version upgrades and OS patches or other environmental changes

 

10. Workflow Execution Credentials & Restrictions

 

Credentials are used for running workflows on the Alteryx Server that contain inputs from databased and various file locations as defined in the Input / Output Tool. There are 3 available options for user credentials:

 

  1. User is not required to specify credentials
  2. User must specify their own credentials
  3. Always run this workflow with these credentials

 

It is also possible to restrict the type of functionality permitted to run, either globally or at a workflow level. Row Level restrictions can also be implemented in 3 ways

 

  1. Managed by the database
  2. Hybrid: Using a ‘Permissions’ table in the DB and filtering in the workflow
  3. Managed within the workflows

 

Conclusion

 

A successful modern analytics platform can deliver both IT control and end-user autonomy and agility. Collaboration between the business and IT is critical to the success of a platform adoption and deployment. IT knows how to manage data and the business knows how to use the insights to drive business decisions. With that, the above highlights key functions of the Alteryx platform that IT loves!


Contributors: David Hare, David Matyas, Nic Morales, Scott Anderson, Ian Coe             


Reference

 


To find out more about how line of business leaders can partner with IT to enable workers across the organization to make data-informed decisions, join Nick Bignell, Director of Data Science Service at UBS, for a Webinar + Q&A where you'll see how he brought disparate data sources together and created a unified analytics strategy for UBS that empowers 12 different business functions.

 

UBS_LP_Q&A.png

Comments