Welcome back to another entry in the Will It Alteryx? series. In this edition, we'll be looking at Data Warehousing as a Service (DWaaS). We'll start with what is DWaaS and then explore if and how Alteryx can integrate with this technology.
Organizations typically leverage many different data sources for everyday business needs ranging from transactions, HR, Finance, etc... Performing analysis on this data directly against the source systems often creates contention with the ingestion of new data and operational reporting. Further, data analysis often requires joining data from disparate systems. Therefore, organizations have adopted data warehouses as a place to store, aggregate, organize, and analyze their data without impacting internal sources. Creating a new Data Warehouse can often take up to 6-12 months before it's operational. In an effort to shorten time to value, cloud-based Data Warehousing as a Service was born.
Data Warehousing as a Service is a relatively new player to the "as a Service" game, joining veterans such as Infrastructure as a Service and Software as a Service. DWaaS provides many benefits over an on-premises data warehouse, including scalability, elasticity, redundancy, unlimited storage, and time to value. Two of the most popular DWaaS providers are Amazon Redshift and Snowflake. Interest in these two providers has grown rapidly over the last five years.
Now that we've discussed what DWaaS is, let's get to the topic at hand.
Alteryx can work with popular DWaaS offerings such as AWS Redshift and Snowflake in a variety of ways. Options include:
#1. Pull data from the data warehouse down to process in a workflow on the Alteryx Designer or Alteryx Server machine using the standard Input tool. All that is needed for this is a valid ODBC connection to the data warehouse.
DevOps or IT will need to allow inbound access to the data warehouse. The example below shows an inbound rule on the VPC Security Group being used by Redshift:
#2. Writing data from a workflow on the Alteryx Designer or Server machine to the data warehouse using the standard Output tool. Again, all that is needed for this is a valid ODBC connection to the data warehouse.
It can take a while to write data to Snowflake or Redshift using this method, which leads us to number 3...
#3. Writing data with a Bulk Connection. The Snowflake and Redshift Bulk Connection write options both work by first writing the output data to Amazon S3 object storage, then ingesting the data into the data warehouse from S3. The write times are significantly faster using the Bulk Connection compared to the standard write directly to Snowflake or Redshift.
Note, to use this feature you must have an AWS Access and Secret Key which are offered through AWS IAM (Identity and Access Management). These can be obtained from the "My Security Credentials" link under your AWS account:
When selecting a Data Connection on the Output Tool, make sure to select the Bulk option:
The Bulk Connection can be easily configured with your ODBC Driver details, S3 Access Key and Secret Key, and S3 Bucket Name. Note that AWS Regions created after Jan 30, 2014, only support Signature Version 4 for Authentication.
#4. Saving the best for last, Redshift and Snowflake are both supported In-Database data sources. The only requirement here is that the ODBC Driver must be 64-bit. With In-Database tools, all the processing occurs in the database, eliminating data movement, and taking advantage of the scalable, highly performant infrastructure of the DWaaS.
With more and more organizations adopting cloud-based data warehousing solutions, it's important to understand how to analyze that data and unlock business insights. With this knowledge in hand, I encourage you to visit your local grocery store and observe the data warehouse in action as blogged by @GarySp.
Until next time, please leave a comment below if you have any questions or suggestions for additional "Will it Alteryx" topics.
David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.
David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.