Do you use Alteryx in a language other than English? If so, we want to hear from you! Please help us improve the international experience of our products by participating in this 5 minute survey.

We are updating the requirements for Community registration. As of 7/21/21 all users will be required to register a phone number with their My Alteryx accounts. If you have already registered, you will be prompted on your next login to add your phone number.

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.

Will It Alteryx Blog-Banner.png


Welcome to a new Engine Works blog series, where I'll be exploring various data sources, data types, and modeling techniques, and answering the age-old question, "Will it Alteryx?"  


I'll start this series with Presto, a technology that many customers have been asking about recently.  Presto is a distributed SQL query engine that can work with many data source systems.  It uses a MPP (massively parallel processing) design where SQL queries can be processed in parallel across a cluster of machines.  This may sound identical to other MPP style data sources like Hadoop or Teradata, however the major difference here is Presto is not storing the data.  In fact, Presto is much more flexible and pluggable as it can be configured to query multiple back end source systems through "connectors."  Many organizations are adopting Presto as a single engine to query against all available data sources.


Presto can connect to many data source and BI/Reporting systems.Presto can connect to many data source and BI/Reporting systems.


Environment Setup


In order to find out if Alteryx can work with Presto, I needed an environment to test.  Presto follows a familiar architecture where you have one Master and multiple Workers.  In Presto terms the master is called the "Coordinator" and is responsible for handling client connections and distributing work to the workers.  The workers retrieve the data from the source systems defined by the connectors, and work together to process the requests directed by the Coordinator.  


This fits perfectly with Hadoop architecture so I collocated the Presto Coordinator on a Hadoop NameNode and configured Presto Workers on 3 Hadoop DataNodes.  After configuring the workers and coordinator to all talk to each other, I was able to validate my new Presto cluster using the Web Interface:


Presto Web UIPresto Web UI


Connectors and Catalogs and Schemas! Oh My!


With Presto there are a couple new terms to be aware of in addition to the database schema that define how to connect to data sources.  A "connector" defines how Presto connects to the back end data source system, including the server hostname, port, and credentials.  A "catalog" contains schema information and references a specific connector.  All of these are easily configured in properties files, and in no time at all I had connections to Hive, MongoDB, PostgreSQL, Amazon Redshift, and Microsoft SQL Server:


[root@presto-coordinator catalog]# ls -alh
-rw-r--r-- 1 root root 100 May 22 15:41
-rw-r--r-- 1 root root 82 May 22 14:38
-rw-r--r-- 1 root root 133 May 22 14:25
-rw-r--r-- 1 root root 131 May 22 15:19
-rw-r--r-- 1 root root 146 May 22 14:48


Querying Your Data


Presto includes a command line tool for running queries against specific data sources that have been configured in your Presto cluster.  It can be launched against a catalog and schema to allow users to run queries interactively:


[root@presto-coordinator presto]# ./presto --catalog hive --schema default
presto:default> show tables;
(4 rows)

Query 20190529_124127_00007_k5hzu, FINISHED, 4 nodes
Splits: 53 total, 53 done (100.00%)
0:00 [4 rows, 513B] [71 rows/s, 2.57KB/s]



Or, the query can be specified directly on the command:


[root@presto-coordinator presto]# ./presto --catalog hive --schema default --execute "SELECT * FROM acmebank_stress_test WHERE loan_type='Term loan' AND collateral>=100000;"

"2008-06-30 00:00:00.000","100012891","36","32","BB","Pool5","1952021.0","Term loan","","1764250.0","112457.708875471","506059.689939621","-13899.7728170082","-13775.9730136383","-110529.743351998","","","","1764250.0","0.007967812730301","0.0","0","253029.84496981","197363.279076452","168686.563313207","0.151388441875723","267087.058579244","154629.349703773","0.087645940033313","0.26293782009994","510786.287443657","506059.689939621","0.990746428359126","0.15","47","2","WA","West","Retail Lending","9","Mortgage","Consumer","Bank B","Collateral"


Will it Alteryx?


With an environment up and running, it's time to answer the question on everyone's mind, "Presto, will it Alteryx?"   The answer is yes, it will!  Although not listed as a Supported Alteryx Data Source, Presto data can be read in through the standard Input tool using a Generic ODBC connection.  To support this, the machine where Alteryx is running (server, laptop, desktop) needs the Presto ODBC Driver installed. 


Additionally, an ODBC configuration must exist for each data source you intend to query.  The Presto Coordinator hostname and port must be specified, along with any credentials required to make the connection:


ODBC Driver SetupODBC Driver Setup


Catalog and Schema selections allow you to specify the intended data source to connect to:

ODBC Driver Setup - Catalog selectionODBC Driver Setup - Catalog selection

Once the ODBC Driver has been configured, the Input tool can be easily configured to pull data from Presto into Alteryx Designer for processing:


Reading in data from Hive through PrestoReading in data from Hive through Presto


While this blog is not meant to provide performance details, I did observe consistently faster results reading a test table from Presto (~12 seconds) compared to reading from Hive (14 sec) or Impala (24 sec).  Additionally, I was able to use the In-Database tools with Presto by specifying a Generic ODBC Connection and the processing completed in under 3 seconds!


In-Database workflow using PrestoIn-Database workflow using Presto


Final Thoughts


Alteryx working with Presto opens up many opportunities for data analysis as Presto continues to adopt more connectors to other data sources.  Currently not listed as a supported data source for the standard input tools or In-Database tools, Presto can still be queried using a generic ODBC connection.  With Presto's growing popularity it is likely to be considered as a future addition to the supported data sources list.


If you have any technologies you would like to see explored in future installments of the "Will it Alteryx" series, please leave a comment below!  


Fun Facts

  1. "presto" is Italian for quickly.
  2. Presto was originally developed at Facebook to provide faster queries against data in Hive. 



David Hare
Manager, Solutions Architecture

David is a manager of the Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.

David is a manager of the Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.

8 - Asteroid

I'd like to see a more thorough look at Snowflake.

6 - Meteoroid

Great article David. Presto is extremely fast over Hive and having Presto be available as part of the Alteryx arsenal will take users to the next level.