Engine Works

DavidHa · ‎10-01-2019

Will It Elasticsearch.png

Elasticsearch is a distributed search engine based on Apache Lucene. It provides a REST-based full-text search engine against unstructured and semi-structured data. As such, it is a document-oriented database where the data is stored in JSON format and indexed for quick retrieval. Elasticsearch has become the most popular enterprise search engine, far surpassing Solr and Lucene.

Interest in Elasticsearch has rapidly increased over the last 5 years. Interest in Elasticsearch has rapidly increased over the last 5 years.

Elasticsearch is one component in the "Elastic Stack," along with a log parsing engine called Logstash, and a visualization tool called Kibana. These three are typically implemented together to provide organizations with a way to quickly identify key information in log files and other text-based data sources. This is especially common with container-based architectures such as Alteryx Promote:

The Elastic Stack The Elastic Stack

Once data is in Elasticsearch, it can be quickly visualized in Kibana to understand relationships, outliers, summary information, etc...

Kibana allows the end user to visualize Elasticsearch data. Kibana allows the end user to visualize Elasticsearch data.

So now that we understand Elasticsearch is a distributed document store in which data is stored as JSON documents instead of a traditional database with tables comprised of rows and columns, the question on everyone's mind is "Will it Alteryx?"

Will it Alteryx?

Elasticsearch version 6.3 added SQL support through a feature called X-Pack which allows end users to query this document-oriented data structure using familiar SQL syntax, referencing table names and columns just like any other database. SQL access is enabled by default and can be controlled in the elasticsearch.yml configuration file:

xpack.sql.enabled: true

With SQL access enabled on our Elasticsearch cluster we can query, ingest, transform, and analyze Elasticsearch data using Alteryx!

First, we need to install an ODBC driver and configure an ODBC connection to connect to the Elasticsearch cluster. I found two options which I've linked below, but there certainly could be others. In my testing, both drivers were able to successfully connect.

Elastic ODBC Client

In the ODBC Configuration, specify the host, port, user, password, and any certificates if SSL is enabled. After entering the required connection details I was able to verify a successful connection.
CData ODBC Driver

This driver had a lot more configuration options, but only one additional setting was required which was the Authentication Scheme setting, which in my case needed to be set to BASIC. Your environment may require a different setting.

Once a successful ODBC connection to the Elasticsearch cluster has been established, we can configure an Input tool to read data from Elasticsearch using the Generic ODBC Connection:

The Generic ODBC Connection The Generic ODBC Connection

After specifying the Elasticsearch ODBC connection I have a couple of options for pulling in data.

1. Specify a table from the available list:

2. Build a custom SQL query to execute by selecting specific columns within a table in the Visual Query Builder panel.

With the Input tool configured appropriately, I'm able to build a workflow analyzing data from the Elasticsearch cluster. In the example workflow below my goal was to ingest airplane flight log data to understand what causes flights to be canceled, and build a model to predict future flight cancellations.

Final Thoughts

Elasticsearch usage is growing at an impressive rate thanks to its distributed architecture which provides linear scalability for full-text search capabilities against big data. With SQL access, Alteryx can provide self-service analytics on semi-structured and unstructured data straight from Elasticsearch, opening up new opportunities for business insights!

Until next time, please leave a comment below if you have any questions or suggestions for additional "Will it Alteryx" topics.

References

wale_ilori · ‎10-01-2019

Thanks for the write up. Have tried this before but seems to be a handy bit of knowledge to have.

renatofeio · ‎06-04-2020

You mention in the text the possibility of ingesting data on Elastic from Alteryx, have you managed to do this using the same steps? I see that this way of using Alteryx to do data analysis and treatment and then insert the data into elasticsearch is an interesting possibility, since the ease that Altery promotes to its end user, where there is no need for programming, as it happens no elastic, so you save time and save on project hours.

lacymills2 · ‎10-03-2023

Just FYI from experience - when connecting to a specific index (dataset) the driver version you install has to be compatible with the index you're connecting to. Ran into a lot of issues before figuring that out but once I did, works great!

Engine Works

Elasticsearch, will it Alteryx?

Will it Alteryx?

Final Thoughts

References

Alteryx Developer

Alteryx Solution Center - 2022

Alteryx mentor

Automated Alteryx Reports

Re: Onboarding 1: Get to Know Alteryx