Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
DavidHa
Alteryx
Alteryx

Will It Elasticsearch.png

 

Elasticsearch is a distributed search engine based on Apache Lucene. It provides a REST-based full-text search engine against unstructured and semi-structured data. As such, it is a document-oriented database where the data is stored in JSON format and indexed for quick retrieval. Elasticsearch has become the most popular enterprise search engine, far surpassing Solr and Lucene. 

 

Interest in Elasticsearch has rapidly increased over the last 5 years.Interest in Elasticsearch has rapidly increased over the last 5 years.

 

 

Elasticsearch is one component in the "Elastic Stack," along with a log parsing engine called Logstash, and a visualization tool called Kibana. These three are typically implemented together to provide organizations with a way to quickly identify key information in log files and other text-based data sources. This is especially common with container-based architectures such as Alteryx Promote:

 

 

The Elastic StackThe Elastic Stack

 

 

 

Once data is in Elasticsearch, it can be quickly visualized in Kibana to understand relationships, outliers, summary information, etc...

 

Kibana allows the end user to visualize Elasticsearch data.Kibana allows the end user to visualize Elasticsearch data.

 

 

So now that we understand Elasticsearch is a distributed document store in which data is stored as JSON documents instead of a traditional database with tables comprised of rows and columns, the question on everyone's mind is "Will it Alteryx?"

 

 

 

Will it Alteryx?

 

Elasticsearch version 6.3 added SQL support through a feature called X-Pack which allows end users to query this document-oriented data structure using familiar SQL syntax, referencing table names and columns just like any other database. SQL access is enabled by default and can be controlled in the elasticsearch.yml configuration file:

 

xpack.sql.enabled: true

 

With SQL access enabled on our Elasticsearch cluster we can query, ingest, transform, and analyze Elasticsearch data using Alteryx!

 

First, we need to install an ODBC driver and configure an ODBC connection to connect to the Elasticsearch cluster. I found two options which I've linked below, but there certainly could be others. In my testing, both drivers were able to successfully connect.

  1. Elastic ODBC Client

    In the ODBC Configuration, specify the host, port, user, password, and any certificates if SSL is enabled. After entering the required connection details I was able to verify a successful connection.  

    odbc-config.PNG


  2. CData ODBC Driver

    This driver had a lot more configuration options, but only one additional setting was required which was the Authentication Scheme setting, which in my case needed to be set to BASIC.  Your environment may require a different setting. 

odbc-config-cdata.PNG

 

 

Once a successful ODBC connection to the Elasticsearch cluster has been established, we can configure an Input tool to read data from Elasticsearch using the Generic ODBC Connection:

 

The Generic ODBC ConnectionThe Generic ODBC Connection

 

 

After specifying the Elasticsearch ODBC connection I have a couple of options for pulling in data.

 

1. Specify a table from the available list:

 

Choose_Table.png

 

2. Build a custom SQL query to execute by selecting specific columns within a table in the Visual Query Builder panel.

 

Visual_Query_Builder.PNG

 

 

With the Input tool configured appropriately, I'm able to build a workflow analyzing data from the Elasticsearch cluster.  In the example workflow below my goal was to ingest airplane flight log data to understand what causes flights to be canceled, and build a model to predict future flight cancellations. 

 

Workflow.PNG

 

 

 

 

Final Thoughts

 

Elasticsearch usage is growing at an impressive rate thanks to its distributed architecture which provides linear scalability for full-text search capabilities against big data.  With SQL access, Alteryx can provide self-service analytics on semi-structured and unstructured data straight from Elasticsearch, opening up new opportunities for business insights!  

 

Until next time, please leave a comment below if you have any questions or suggestions for additional "Will it Alteryx" topics.

 

References

David Hare
Senior Manager, Solutions Architecture

David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.

David has the privilege to lead the Alteryx Solutions Architecture team helping customers understand the Alteryx platform, how it integrates with their existing IT infrastructure and technology stack, and how Alteryx can provide high performance and advanced analytics. He's passionate about learning new technologies and recognizing how they can be leveraged to solve organizations' business problems.

Comments