
Elasticsearch is a distributed search engine based on Apache Lucene. It provides a REST-based full-text search engine against unstructured and semi-structured data. As such, it is a document-oriented database where the data is stored in JSON format and indexed for quick retrieval. Elasticsearch has become the most popular enterprise search engine, far surpassing Solr and Lucene.
Interest in Elasticsearch has rapidly increased over the last 5 years.
Elasticsearch is one component in the "Elastic Stack," along with a log parsing engine called Logstash, and a visualization tool called Kibana. These three are typically implemented together to provide organizations with a way to quickly identify key information in log files and other text-based data sources. This is especially common with container-based architectures such as Alteryx Promote:
The Elastic Stack
Once data is in Elasticsearch, it can be quickly visualized in Kibana to understand relationships, outliers, summary information, etc...
Kibana allows the end user to visualize Elasticsearch data.
So now that we understand Elasticsearch is a distributed document store in which data is stored as JSON documents instead of a traditional database with tables comprised of rows and columns, the question on everyone's mind is "Will it Alteryx?"
Will it Alteryx?
Elasticsearch version 6.3 added SQL support through a feature called X-Pack which allows end users to query this document-oriented data structure using familiar SQL syntax, referencing table names and columns just like any other database. SQL access is enabled by default and can be controlled in the elasticsearch.yml configuration file:
xpack.sql.enabled: true
With SQL access enabled on our Elasticsearch cluster we can query, ingest, transform, and analyze Elasticsearch data using Alteryx!
First, we need to install an ODBC driver and configure an ODBC connection to connect to the Elasticsearch cluster. I found two options which I've linked below, but there certainly could be others. In my testing, both drivers were able to successfully connect.
- Elastic ODBC Client
In the ODBC Configuration, specify the host, port, user, password, and any certificates if SSL is enabled. After entering the required connection details I was able to verify a successful connection.

- CData ODBC Driver
This driver had a lot more configuration options, but only one additional setting was required which was the Authentication Scheme setting, which in my case needed to be set to BASIC. Your environment may require a different setting.

Once a successful ODBC connection to the Elasticsearch cluster has been established, we can configure an Input tool to read data from Elasticsearch using the Generic ODBC Connection:
The Generic ODBC Connection
After specifying the Elasticsearch ODBC connection I have a couple of options for pulling in data.
1. Specify a table from the available list:

2. Build a custom SQL query to execute by selecting specific columns within a table in the Visual Query Builder panel.

With the Input tool configured appropriately, I'm able to build a workflow analyzing data from the Elasticsearch cluster. In the example workflow below my goal was to ingest airplane flight log data to understand what causes flights to be canceled, and build a model to predict future flight cancellations.

Final Thoughts
Elasticsearch usage is growing at an impressive rate thanks to its distributed architecture which provides linear scalability for full-text search capabilities against big data. With SQL access, Alteryx can provide self-service analytics on semi-structured and unstructured data straight from Elasticsearch, opening up new opportunities for business insights!
Until next time, please leave a comment below if you have any questions or suggestions for additional "Will it Alteryx" topics.
References