Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
pmaier1971
Alteryx
Alteryx

Safeguarding personal information can be challenging, and complying with applicable privacy laws and regulations implies navigating a complex, rapidly changing landscape. Alteryx provides customers with the ability to control where their data goes and to configure their on-prem server in such a way as to meet the customer's needs with respect to GDPR.  In this blog post, we outline how Alteryx servers (on-prem) process data. This helps our customers to make an informed choice on how to set up an on-prem server architecture to comply with stringent new standards in terms of compliance, security, and data protection. Note that we focus only on the implications of GDPR for the Alteryx server architecture and do not detail all configurations that may be relevant to hardening and governing a server environment. 

 

Disclaimer: The content of this community post is intended to provide a general guide and discussion on the topic. It does not constitute legal advice and should not be treated as such. You should not act upon the information provided without consulting with your own legal counsel for any specific queries or concerns. 

 

Backdrop: What is GDPR?

 

The General Data Protection Regulation 2016/679 (GDPR) is a European privacy law intended to enhance individuals’ control and rights over personal data. It became enforceable in 2018, harmonizing data protection laws throughout the European Union (EU) by creating consistency across EU member states on how personal data can be processed, used, and exchanged securely. The regulation became a model for many other laws, including new privacy laws in the United Kingdom, Turkey, Mauritius, Chile, Japan, Brazil, South Korea, and others, so the basic principles laid out in this post can also apply to other countries who have privacy or other laws that govern the transfer of personal data across borders. 

 

GDPR applies to all organizations established in the EU and to organizations, whether or not established in the EU, that process the personal data of EU data subjects in connection with offering goods or services to data subjects in the EU or monitoring behavior taking place in the EU. Personal data is any information relating to an identified or identifiable natural person.  

 

The regulation is centered around data controllers (organizations that collect information about living people), data processors (organizations that process data on behalf of a data controller, like a cloud service provider), and data subjects (defined in Article 4 of the Regulation). Under GDPR, Alteryx can be both a data processor and a data controller. Article 33 states that organizations must demonstrate the security of the data they are processing and their compliance with GDPR on a continual basis by implementing and regularly reviewing technical and organizational measures. The transfer of data from an EU to a non-EU country requires: 

  • That the EU formally recognize that the country’s privacy laws provide an adequate or equivalent level of protection as compared to the GDPR. 
  • There are appropriate safeguards in place, and data subjects have enforceable rights and effective legal remedies, or
  • There is a derogation for the specific situation (such as explicit consent given by the data subject). 

 

Simply put: if data travels overseas, then the EU standards of data protection must travel with it. 

 

How Does This Apply to You?

 

Since privacy regulation can be a complex topic, let us consider a concrete example to bring it to life. A US-based company employs staff in Europe and wants to leverage Alteryx to process their EU payroll data. Given that payroll information includes information that falls under the definition of personal data, the company needs to comply with GDPR, which means that appropriate safeguards need to be in place for transmitting and storing private information. 

 

Leveraging Alteryx Server, let’s walk through several typical setups, highlighting options to manage and optimize how personal data is transferred outside of the EU. 

 

Option 1: An EU-Based Alteryx Server and Local Data Processing

 

Figure 1 depicts a setup leveraging a local Alteryx on-prem server and local data processing. The data is hosted in EU-based data warehouses and either directly imported into Alteryx or via Excel spreadsheets (stored locally).

 

figure 1.png

Figure 1: EU-Based Alteryx Server and Local Data Processing

 

We assume that EU data is access-controlled, so US-based users would not be able to access personal data stored in the EU. One option exists to add an extra layer of security: leveraging a “Run-As” account. If the workflow is designed to force the executing user to provide a Windows user account, then the workflow executes as that user, and its access to network file shares is controlled by the user’s set of permissions. This implies that provided adequate data access rights are set, only EU-based users of the workflow may be able to successfully read or write data to a specific network file share location, and US-based users may not. In that case, as the diagram shows, all data processing is done in the EU.  

 

One potential issue is that if a workflow generates a report, a US-based user could download the report. Depending on the design of the report, the US-based user could potentially be accessing personal data. This type of event can be addressed by creating 2 copies of the workflow (with and without the report) and storing them in different collections so US users can only access the version that does not produce the report. A peer-review process to ensure these safeguards are in place would be helpful in further reducing any chance of this happening. Overall, this setup ensures that no output files containing confidential data would be provided to non-EU users, with as applicable safeguards the workflow design, the server’s authorization model, and a peer-review process to ensure that these best practices were followed. 

 

This is the setup that Alteryx generally recommends. Provided that data is locally stored and file access is controlled, this setup minimizes unnecessary cross-border transfers. 

 

Option 2: A US-Based Alteryx Server and US Data Processing

 

For clients with an existing US server, it may be tempting to leverage the existing infrastructure to process EU personal data. Despite all data being encrypted in transit and at rest, this setup may still result in additional cross-border transfers.  

 

Figure 2 outlines why: A core issue here is that even though data is encrypted, Alteryx workflows could be optionally designed such that data values can end up in MongoDB.  

 

  • For instance, if a user uploads a file as part of the workflow, those uploaded files are stored in MongoDB subject to a configured retention period.  
  • If a job fails, it is possible that temporary files will be left on the worker node, which may need to be cleaned up.  
  • It is even possible that when building the workflow, Alteryx users leverage files with personal data and choose to package those files with the published workflow, so potentially, these embedded files may be stored in the Mongo DB as well.  

 

None of these workflow design patterns are necessary for the standard operation of the Server. These optional workflow designs exist and hence do pose a risk that data values could end up in the MongoDB. 

 

figure 2.png

Figure 2: US-Based Alteryx server and US Data Processing

 

Several ways exist to manage cross-border data flows. First and foremost, a robust workflow review process by an Alteryx-certified user before migrating the workflow to the server helps ensure that the workflow’s design follows best practices. In addition, making best practice guides available to users and training them on the importance of GDPR when designing workflows acts as an additional safeguard. Educating users on managing workflow assets, avoiding storing input files in the workflow, and not producing any downloadable reports substantially reduces the risk of unintended transfers. Moreover, where possible, we recommend leveraging in-DB tools to help limit moving data between jurisdictions.  

 

With these mitigating factors in place, the chance of unintended cross-border transfers are reduced. That said, this setup requires more controls and oversight, and Alteryx would typically have a different setup when processing data subject to GDPR. 

 

Option 3: A US-Based Alteryx Server With an EU Worker Node

 

For some of our clients, a tempting “middle ground” option might be to leverage a US-based Alteryx platform but have a dedicated EU worker node to handle personal data. By tagging workflows, execution can be pushed to specific worker nodes, ensuring local data processing and storage. 

 

This is the most complex setup, depicted in Figure 3. We assume that once migrated to the server, the workflow can be triggered by EU- and US-based users. To follow along more easily, we denote transfer risks by a red circle on the figure and summarize both transfer risks and mitigating factors in Table 1 below. 

 

  • First, even if workflow tagging is enabled, there is no guarantee that all workflows will be tagged correctly. The risk that an incorrectly tagged workflow is executed on the US worker node is denoted as R1 (“Risk #1”) in the red circle in Figure 3.  
  • Second, as before, depending on the workflow design, input and output files can be stored in the Mongo DB. Given that in this setup, the Mongo DB is located outside the EU, we tag these risks as R2 and R3. 
  • If a workflow is indeed incorrectly tagged and executed in the US, the additional risk of leaving temporary files on a US worker node is denoted as R4. 
  • Lastly, again, depending on the workflow design and workflow asset management setup, if users do not manage assets carefully, embedded files may end up in the Mongo DB (R5). 

 

figure 3.png

Figure 3: A US-Based Alteryx Server With An EU Worker Node

 

Several options exist to mitigate these transfer risks (see Table 1), and as before, a robust workflow review process Is a centerpiece of a risk mitigation strategy. 

 

  • A workflow best-practice guide, including the recommendation to avoid including downloadable workflow reports, helps mitigate the possibility that confidential information is accessed outside the EU.  
  • As before, leveraging a “Run-As” account and using different collections for copies of workflows that do produce outputs will be additional safeguards. 
  • In-DB tools push more workload into local databases and reduce moving data.  
  • A peer-review process by an Alteryx-certified user is recommended to reduce the likelihood of packaging data files with the workflow.  
  • And lastly, training associates about the importance of the privacy regulation and their impact on workflow design drives understanding of workflow design patterns and helps ensure compliance. 

 

figure 4.png

Table 1: Potential Risks and Mitigating Factors

 

Summary

 

The debate about privacy and how to safeguard personal data is rapidly evolving, and this post can only scratch the surface.  Careful planning of the Alteryx architecture is paramount and can help establish effective privacy practices. 

 

In this post, we reviewed some common setups and outlined the risks associated with them. While the exact recommendations may vary depending on the client’s setup, deploying Alteryx Server into the jurisdiction or region imposing local data processing rules is typically the best way to manage these risks.

 

 

Special thanks for helpful comments to Dan Hilton (@DanH) and Brian Quinn (@BQ22) on earlier versions of this post.