This is Part 2 of the “Ins and Outs of In-DB" series. Check out Part 1 for an intro to in-database processing and how to get started with Snowflake.
I’ve been going down (“pushing” down?!) a rabbit hole about in-database.
As I wrote about in my last post, in-database (in-DB) processing is when you push down into a data source and do your prepping, blending, and analyzing right there in the database. Since you’re not moving data around, it’s more efficient to execute in-DB.
There are quite a few data sources that Alteryx can connect with to perform in-DB workflows. Today I’m hyping up AWS because it lives up to the hype: AWS makes it easier to do big – dare I say, awe-inspiring – things with Alteryx.
Source: GIPHY
Chances are, one in three of you work at a company using AWS if your company uses cloud technology. At 33% of the cloud market, AWS is a leader in cloud infrastructure. So it’s good to know what’s what in the world of AWS and how you can use it with Alteryx.
AWS has an all-you-can-eat buffet’s worth of tools for storing and managing data, from data lakes to warehouses. And we can access it all without having to be a cloud admin or engineer.
Here’s a rundown of the AWS data services you’re likely to use with Alteryx:
- S3: It’s no shocker that S3 is one of the most popular storage services used with Alteryx. It lets you store as much as you want and whatever you want: it takes both structured and unstructured data types, making it well-suited for analysis you’re doing in Alteryx on images, sound files, or other non-text files. Use the Amazon S3 Download tool.
- Athena: Amazon’s interactive query service lets you query data directly from S3. This comes in handy when you’re quickly retrieving subsets of data to analyze in Alteryx.
- Aurora: This is a relational database management system designed for high performance in the cloud, a powerful tool for your extra-demanding workloads.
- Redshift: Another popular choice for Alteryx users, Amazon’s cloud data warehouse is optimized for large amounts of data (unfathomable petabytes) and massive scalability.
That last one, Redshift, is the place to be for in-database processing.
And because Redshift is so great for enterprise-level (read: huge, powerful) workloads, you can do big things with in-DB.
Alteryx supports a range of in-database tools on Redshift, including Filter, Join, Union, and more. As with any powerful cloud source, it’s best to do your in-DB processing earlier in the workflow (like the prep and blend stages). That way, you really take advantage of AWS’ compute resources instead of using your own.
To get it set up, make sure you’ve got an AWS account and access to Redshift. Download the ODBC driver for Redshift, which makes it possible to connect. Then connect to Redshift by dropping the In-DB Connect Tool on the canvas. Add a new connection and select Amazon Redshift from the list of data sources.
Once you’re connected, this is the part when you get to do something awe-inspiring.
When you use Alteryx in-database with Redshift, you can accomplish some truly herculean tasks. I’m talking superhuman undertakings that wouldn’t be possible (or at least way more time-consuming) without the AWS engine’s horsepower and Alteryx to harness it with in-DB.
Think big. One Alteryx user was able to construct an entirely new data warehouse. Another example? The Chick-fil-A app!
Chick-fil-A used Redshift and Alteryx together to build its customer loyalty program. By using Alteryx to process billions of customer records within Redshift, Chick-fil-A could efficiently access the data needed to get the loyalty program up and running.
Feeling inspired? Ready to start building the next game-changing customer loyalty program? Whether you’re embarking on an ambitious project or just looking to try out a few workflows, the AWS Starter Kit is a good place to get started. It’s a three-in-one guide with ready-to-use templates for Redshift, S3, and Aurora.
Happy pushing!
Resources: