Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
JoshH
Alteryx Alumni (Retired)

In my last blog titled, “Self-Service Analytics: The Tug-of-War between Flexibility and Governance”, I wrote about how flexibility and governance shouldn’t be an “either/or” decision when it comes to self-service analytics – you can have both. This has become an increasingly popular topic as organizations gain more access to self-service tools for data preparation and analytics. But how does Alteryx handle analytic governance? And what are the recommendations for ensuring data quality and data management?

 

To better understand data governance in Alteryx, I recently sat down with experts from our engineering and product management teams and asked them a few questions, and here’s what they had to say:

 

How does Alteryx handle data authentication and authorization?

 

KoryC.pngKORY CUNNINGHAM: When it comes to data governance strategies, one of the first things you have to ask is ultimately how the data is being accessed. With Alteryx, we are respecting any existing authentication and authorization that might already be applied at the database level itself. When Alteryx connects to the data, it is using usernames and passwords that already exist in your systems, or it can use your network identity using pass through authentication to access the data similar to other role based permissions you have in place (row level, cell level or object level). So we ensure we respect that from the beginning as we access the data.

 

Second, we ensure the analytics that your analysts are sharing have another layer of governance on that, something we call "analytic governance" giving you the ability to respect those data access rights at that analytic layer and only make them available to a set of users or department as your administrators define. 

 

GaryS.pngGARY SCHWARTZ: Windows Authentication on a Private Gallery makes data security easier to manage than just email authentication, especially in larger organizations with different roles and privileges. While role access and permissions to analytics can be managed in the gallery through either built-in authentication or Windows Authentication, organizations often prefer to leverage their existing Active Directory group and user permissions to make authentication and sharing much easier.

 

 

How is the data being managed? Is the data persisted?

 

GaryS.pngGARY SCHWARTZ: When it comes to the Alteryx platform and how the data is being managed, there is a lot happening, but one thing that is unique about Alteryx is that we don’t create a separate persistence layer to store any of the data as its being processed. We are essentially a multi-tenant pipeline that consumes much of the data in-memory without persisting it. The only time we persist information is temporarily if the processing needs to be on disk, but that sandbox is deleted once processing completes, and there isn’t any data left behind. We do persist data that users upload as inputs to their workflows so that they can be used repeatedly, but that data is secured and only available to the user that uploaded it.

 

 

SteveA.pngSTEVE AHLGREN: As Gary mentioned, Alteryx Server supports multi-tenancy from the ground up, processing data in-memory and writing/reading temporary data in a sandboxed environment. A single instance of Alteryx Server can be processing multiple workflows at the same time and none of the underlying Alteryx Engine instances will share data across that Server. This allows different departments to use the same platform without risking data being accessible between those departments. For example, an HR group using sensitive data can be on the same Alteryx Server as the marketing department and they don’t have to worry about access to sensitive analytics and data.

 

 

JCR.pngJC RAVENEAU: Another way that Alteryx handles data in-memory, is through our In-Database processing capabilities. As Gary and Steve mentioned, typically the data is processed in memory inside of Alteryx. But depending on what you are doing with the data and how much data you are extracting from your source systems, it might make sense to do that processing on the database itself. Our In-DB suite of tools essentially allows you to build queries on Hadoop, Redshift, Azure SQL Data Warehouse and other sources without having to know how to code or write SQL statements.  From a Data Governance standpoint, in-Database processing ensures that database security is fully enforced and raw data never leaves the data warehouse.

 

How can you track data lineage within Alteryx?

 

KoryC.pngKORY CUNNINGHAM: The Alteryx drag and drop interface provides a visual into the transformations taking place within the workflow. You can see what data sources are being brought in, how they are ingested, how they are being prepped, blended and any type of analytics being applied. But behind the interface, as you are building out your workflow, you are building out XML that represents that workflow which can be parsed out into Alteryx or other tools to be able to get insights into what's happening in that workflow.  

Also, with the logging capability, you can get a view into what’s going on in the system. You can see what workflows are being run, what data sources are being accessed, number of records read, and number of records written. It also shows you the utilization or how much they are impacting a data source from just looking at the logs. 

 

 

JCR.pngJC RAVENEAU: Just to expand on what Kory said, Alteryx provides an open xml pipeline of readable data that is fully traceable. Customers who wish to integrate Alteryx in a wider data governance framework can parse the XML and integrate the Alteryx workflow metadata into their existing metadata management solutions.

 

 

How do you ensure secure sharing of data?

 

KoryC.pngKORY CUNNINGHAM: I talked a little about this in the Alteryx Server FAQ blog, but you can fully manage the data access rights within Alteryx to ensure secure sharing of data. The administrator can manage the Studios which are the work group’s spaces and who is authorized in each Studio. In each Studio you can also set up the credentials so any user that runs a workflow will run it at the level that they have been permissioned for.  And as an administrator, you have full control over these data access rights.

The Studios and Collections become important when you start publishing workflows. The workflows that are uploaded to a Gallery default to being published in the Studios that the user publishes it in. So only users in that Studio have access to it. But we have a few options for making these workflows available to others within the Gallery: You can make it available to everyone in the Gallery by making it public, so any user that has access can run that workflow and get the same results. But if you need to provision access more specifically, you can do so by way of Collections, which is an easy way to share a group of apps to a group of users or individual users by leveraging your Active Directory.

 

In summary

Remember,  governance and flexibility isn’t a binary decision. You don’t have to sacrifice one for the other. From data access and data management to authorization and authentication, Alteryx provides a number of ways to ensure that you have analytic governance in place to ensure data quality, data security, and that you are adhering to your corporate standards and in compliance with broader data governance best practices. Big thanks to Kory, Gary, Steve and JC for sitting down with me and answering some questions on data governance.

 

Thanks for reading,

Josh Howard

Josh Howard
Sr. Director, Product Management

Josh Howard is a technology product veteran covering trends in information management and based in the Denver/Boulder, CO area. He has more than 20 years of experience in developing product and go-to-market strategies across a wide variety of data technologies including business intelligence, analytics, data integration, and database development. He is currently the Sr. Director of Product Management at Alteryx. You can follow Josh on Twitter at @Joshoward

Josh Howard is a technology product veteran covering trends in information management and based in the Denver/Boulder, CO area. He has more than 20 years of experience in developing product and go-to-market strategies across a wide variety of data technologies including business intelligence, analytics, data integration, and database development. He is currently the Sr. Director of Product Management at Alteryx. You can follow Josh on Twitter at @Joshoward

Comments
BARTONCONNIE
6 - Meteoroid
JC RAVENEAU said: "Alteryx provides an open xml pipeline of readable data that is fully traceable. Customers who wish to integrate Alteryx in a wider data governance framework can parse the XML and integrate the Alteryx workflow metadata into their existing metadata management solutions." I am attempting to do exactly what you described. It is rather easy to parse the Alteryx XML. But I have several questions about how to interpret the XML. Who can I ask to get answers?