Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
SeanAdams
17 - Castor
17 - Castor

by Sean Adams, Jan Kelley (@jankelleyfl), and Nicole Johnson (@NicoleJ


Introduction

 

So you find yourself trying to connect to data that doesn’t have an ODBC (Open Database Connection) driver available, or the ODBC driver is not particularly easy to use (e.g., IBM DB2 on the Mainframe can be quite tricky to set up).

 

What can you do?

 

Well, you always have the option to work with your tech partners to build a custom API, but this may take months or years.

 

What if we could connect to databases using JDBC (Java DB Connectivity) like our Java developers do – that may make it quicker and easier to get data into Alteryx, and also if you work in a company that uses Java very pervasively – the developers are much more familiar with JDBC and can help you connect.

 

But wait – Alteryx doesn’t support JDBC, right? Well… here’s the recipe!

 

Fundamentals of ODBC and DB Drivers

 

What is a database driver (e.g., ODBC)? Every database vendor builds its platform in different ways using different source code. So even though every relational DB vendor can insert, update, and delete rows from a table, the underlying code is different, so the way to instruct that particular database to insert a row is native to that database (hence the term “native connection”). You can definitely use this native DB interface – however, you have to go through all the hard work of learning the specific way to talk to your database, and if you change databases, this will all stop working.   

 

(Note: Native connectivity to a database can be a lot faster (albeit less portable), so in cases that require high throughput or low latency, developers will actually code directly to the native interface of this database.)

 

Now, what if you didn’t want to deal with the complexities of direct native integration to SQL Server or Oracle – and you wanted a generic way to talk to almost any DB?

 

This is where ODBC came along – the aim of ODBC was to provide a standard, database-independent way to talk to many different databases and DB platforms so that you didn’t need to learn everything about every Native Client; and didn’t need to rebuild your software every time you changed a database. The purpose of ODBC is to give you a standard “interface” (think of this like a standard UI) irrespective of the back-end – and the driver then converts this to the native language of the Database for you.

 

This has been incredibly beneficial for developers and software in general – database access & interaction is now much simpler and largely standardized, and this has led to huge freedom for analytical developers like us who can connect to pretty much any ODBC data source, as long as there’s a driver.

 

What is JDBC?

 

JDBC was designed as a way for Java Developers to connect to databases – ODBC was already a well-established standard at this time – but ODBC is predominantly a Windows standard that requires a driver to be installed. Java developers needed a way to connect to a database that would work across different platforms (since Java runs on everything from smartwatches to multi-node Linux clusters) and which would be consistent with the developer irrespective of platform.

 

So think about JDBC as very similar to ODBC if you’re on a windows platform (which you are likely to be as an Alteryx developer) – but instead of installing a driver on your machine, you point out to a jar file (a java file) which acts as the translator between your standardized instructions and the language that the database understands.

 

Example: Here’s the Microsoft JDBC driver for MS SQL Server.

 

How do you get started?

 

To connect to a database using JDBC in Alteryx, you need a few things:

 

  1. A JDBC driver in a place that is accessible (a .jar file)
  2. The name of the relevant class in that driver (e.g., the main class). Any java program of any complexity will have a few classes – the point of specifying the main class is to tell your code which one to use to start communicating with the database
  3. The hostname (the computer that the database is installed on)
  4. The database name (which database to talk to on the host – a database server can hold multiple different databases – just like a filing room can have many different filing cabinets with different content and different locks and entitlements)
  5. Your login credentials (usually username & password – but may be different for other databases)

 

By the way, if you’re learning for the first time, I strongly recommend that you try this out on a toy project where the only complexity is learning about the connection – for us, this was spinning up a 90-day evaluation version of MS SQL Server, and then connecting to that.

 

The Python

 

Step 1: Bring in the JayDeBeAPI package

 

This package was built as a bridge between JDBC and Python and is super easy to use (https://pypi.org/project/JayDeBeApi/)

 

from ayx import Alteryx

!pip install jaydebeapi

#Use pip install --upgrade --force-reinstall if you want to ensure that it reinstalls over a broken version

import jaydebeapi

 

Step 2: Make sure Java Runtime Env is set up properly

 

Now in order for this to work, you have to have the Java Runtime Environment (JRE) installed on your machine: Download Java for Windows.

 

This installation should create an environment variable called "JAVA_HOME" that generally points to C:\Program Files\java\jre1.8.0_333\bin\server.

 

SeanAdams_0-1657808262353.png

 

NOTE: To check your environment variables – just start typing Environment Variables in your start menu in Windows, and it will bring up an app for this. If you have any trouble – just google “Environment Variables Windows.”

 

Step 3: Set the basic details

 

Within Python, it’s good to set the values of all the connection parameters as separate variables – this just makes it easy for you to maintain and makes the code easier to read:

 

jarFolder = r'D:\OneDrive\Documents\Alteryx data\JDBC\sqljdbc_10.2.1.0_enu\sqljdbc_10.2\enu'
jarFile = jarFolder + r'\mssql-jdbc-10.2.1.jre17.jar'
#jarFile = r'c:\temp\JDBC\Fred.jar'

hostname = 'localhost'
database = 'TestDB'
user = 'myusername'
pwd = 'mycleverpassword'

connectionURL = r"jdbc:sqlserver://{};database={};user={};password={};encrypt=true;trustServerCertificate=true;loginTimeout=5;"
#connectionURL = r"jdbc:sqlserver://{};database={};user={};password={};"
connectionURL = connectionURL.format(hostname, database, user,pwd)

 

Notes:

  1. The JARFolder is where you store the JDBC Database Driver – in a corporate env, this may be stored on a network in some cases, but you can also bundle this into your Alteryx app
  2. JARFile – the actual name of the .jar file that holds your database driver
  3. Hostname – the server that is hosting your database
  4. Database – the name of the specific DB on the server
  5. ConnectionURL – think about this one like the connection string – the format of this is relatively consistent, but it’s good to get the exact structure from your DB vendor. In this case, I got this straight from the MS site listed above.

 

Step 4: Make the connection

 

The final step before your connection is specifying the Java class name. The easiest way to do this is from documentation from the vendor or by asking a friend – but if you’re stuck, you can inspect the JAR file to see what classes exist using the JAR.exe utility, which ships with the JDK - see Viewing the Contents of a JAR File (The Java™ Tutorials > Deployment > Packaging Programs in JAR Fil... 

 

By doing this - we see that the JAR file for MS SQL Server contains a bunch of classes - but the one we want is called com/microsoft/sqlserver/jdbc/SQLServerDriver.class which translates to a jclassname parameter of com.microsoft.sqlserver.jdbc.SQLServerDriver.

 

#Attempt a connection

javaClass = r'com.microsoft.sqlserver.jdbc.SQLServerDriver'
conn = jaydebeapi.connect(jclassname = javaClass,
                         url=connectionURL,
                          jars=jarFile)

 

As you can see here – all we’re doing is using the connectionURL above, which we created with most of the information, along with the link to the JAR file and the correct Java Class.

 

Step 5: Query data and become a famous data scientist

 

queryString = 'Select * from sys.tables'

curs = conn.cursor()
curs.execute(queryString)
results = curs.fetchall()

print(results)
curs.close()
conn.close()

 

This is a simple query that will work on any MS SQL Server (it lists all the tables in the DB) – you should experiment with this to find different ways to interact with the DB – more info is available on the JayDeBeAPI page listed above.

 

Putting it all together

 

Here’s how it looks in the Python tool in Alteryx:

 

SeanAdams_1-1657808262556.png

 

Conclusion

 

While there will be complexity with each new database you work with – at least now we have a relatively simple way to connect to databases that only offer JDBC Connectivity – and in doing so, allow you to do awesome analytics in more places!

 

Authored by Sean Adams (ACE), Jan Kelley (Analytics Super-Enabler), and Nicole Johnson (legendary Product Owner for Alteryx Designer).

Comments