Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
DavidHa
Alteryx
Alteryx

Will it Alteryx.png

 

Welcome back to another entry in the Will It Alteryx? series. In this edition, we'll be looking at Data Warehousing as a Service (DWaaS). We'll start with what is DWaaS and then explore if and how Alteryx can integrate with this technology.

 

What is DWaaS?

 

Organizations typically leverage many different data sources for everyday business needs ranging from transactions, HR, Finance, etc... Performing analysis on this data directly against the source systems often creates contention with the ingestion of new data and operational reporting. Further, data analysis often requires joining data from disparate systems. Therefore, organizations have adopted data warehouses as a place to store, aggregate, organize, and analyze their data without impacting internal sources. Creating a new Data Warehouse can often take up to 6-12 months before it's operational. In an effort to shorten time to value, cloud-based Data Warehousing as a Service was born. 

 

Data Warehousing as a Service is a relatively new player to the "as a Service" game, joining veterans such as Infrastructure as a Service and Software as a Service. DWaaS provides many benefits over an on-premises data warehouse, including scalability, elasticity, redundancy, unlimited storage, and time to value. Two of the most popular DWaaS providers are Amazon Redshift and Snowflake. Interest in these two providers has grown rapidly over the last five years. 

 

 

Cloud based Data Warehousing solutions are quickly gaining interest.Cloud based Data Warehousing solutions are quickly gaining interest.

 

Now that we've discussed what DWaaS is, let's get to the topic at hand.

 

 

Will it Alteryx?

 

Alteryx can work with popular DWaaS offerings such as AWS Redshift and Snowflake in a variety of ways. Options include:

 

#1. Pull data from the data warehouse down to process in a workflow on the Alteryx Designer or Alteryx Server machine using the standard Input tool. All that is needed for this is a valid ODBC connection to the data warehouse.

 

An example AWS Redshift ODBC ConfigurationAn example AWS Redshift ODBC Configuration

 

DevOps or IT will need to allow inbound access to the data warehouse. The example below shows an inbound rule on the VPC Security Group being used by Redshift:

 

A simple inbound rule allowing all traffic to port 5439A simple inbound rule allowing all traffic to port 5439

 

 

#2. Writing data from a workflow on the Alteryx Designer or Server machine to the data warehouse using the standard Output tool. Again, all that is needed for this is a valid ODBC connection to the data warehouse.

 

Snowflake configurations typically require a Warehouse name and Role, rather than a Database name.Snowflake configurations typically require a Warehouse name and Role, rather than a Database name.

 

It can take a while to write data to Snowflake or Redshift using this method, which leads us to number 3...

 

 

#3. Writing data with a Bulk Connection. The Snowflake and Redshift Bulk Connection write options both work by first writing the output data to Amazon S3 object storage, then ingesting the data into the data warehouse from S3. The write times are significantly faster using the Bulk Connection compared to the standard write directly to Snowflake or Redshift.

 

Note, to use this feature you must have an AWS Access and Secret Key which are offered through AWS IAM (Identity and Access Management). These can be obtained from the "My Security Credentials" link under your AWS account:

 

AWS_credentials.png

 

When selecting a Data Connection on the Output Tool, make sure to select the Bulk option:

 

Bulk_Connection2.png

 

The Bulk Connection can be easily configured with your ODBC Driver details, S3 Access Key and Secret Key, and S3 Bucket Name. Note that AWS Regions created after Jan 30, 2014, only support Signature Version 4 for Authentication. 

 

An example Redshift Bulk Connection configuration.An example Redshift Bulk Connection configuration.

 

 

#4. Saving the best for last, Redshift and Snowflake are both supported In-Database data sources. The only requirement here is that the ODBC Driver must be 64-bit. With In-Database tools, all the processing occurs in the database, eliminating data movement, and taking advantage of the scalable, highly performant infrastructure of the DWaaS.

In-DB_Workflow.PNG

 

Final Thoughts

 

With more and more organizations adopting cloud-based data warehousing solutions, it's important to understand how to analyze that data and unlock business insights. With this knowledge in hand, I encourage you to visit your local grocery store and observe the data warehouse in action as blogged by @GarySp.

 

Until next time, please leave a comment below if you have any questions or suggestions for additional "Will it Alteryx" topics.

David Hare
Senior Manager, Solutions Architecture

David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.

David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.