Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

How To: Configure a Databricks Connection

HenrietteH
Alteryx
Alteryx
Created

How To: Configure a Databricks Connection

How to find the relevant information and configure a Databricks connection through the Manage In-DB Connections window. The Databricks connection includes bulk loading capabilities that allow users to load large data sets faster.

Prerequisites

  • Alteryx Designer
  • The Simba Spark ODBC Driver available on theAlteryx Driver Downloads page.
  • Databricks
    • Note that the screenshots were taken on AWS Databricks, but this is virtually the same for Azure Databricks

Procedure

If you have access to the Databricks console you'd like to connect to, you will be able to gather all of the necessary information to connect by yourself. If you do not have access to the console, you might need to ask your administrator for assistance.

  1. To configure the Spark ODBC Driver:Spark on Databricks - how to configure Simba Spark ODBC driver.png
  2. Setting up the In-DB connection in Alteryx:

    • The Read Tab:
      Capture.PNG
    • The Write Tab:

      Capture.PNG

Details on Bulk Loading

In Alteryx, use the Data Stream In tool to load data into Databricks. Select the connection you just created in Steps 1 and 2 above.

When you run the workflow, a temporary avro file will be created in the /FileStore/tables location in Databricks using the information provided on the Write tab in the connection. Using the information provided on the Read tab in the connection, a table will be created in Databricks and then the data will be moved from the temporary file to the table via a 'LOAD DATA INPATH' statement.

A successful run will contain the following messages (this example is for a temporary table in Alteryx):

Capture.PNG

Common Issues

User gets the following error when trying to write to Databricks via the bulk loading option (Data Stream In tool)

Data Stream In (5) Executing PreSQL: "LOAD DATA INPATH '/FileStore/tables/c8ba94243df14a778a9c/testtable1.avro' INTO TABLE testtable1;" : [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: Error running query: org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: /FileStore/tables/c8ba94243df14a778a9c/testtable1.avro;


Make sure the Databricks URL in the connection is configured as:https://Server Hostname exactly as it appears in the ODBC DSN.

No ratings
Comments
Smerker
5 - Atom

Will this work with IAMRole auth to S3?

AlexAung
6 - Meteoroid

I am trying to run Databricks system query such as "show tables" . Although I have system access right, as the query is not a type of "Select" query, how do I run it in Atleryx? I tried using "In-DB connection" tool and it will not work. 

Is there any way to run such kind of system query in Alteryx Designer? Even if using R or Python tool, it is fine too. Please help.

TheCoffeeDude
8 - Asteroid

I wish Alteryx could connect to Databricks without needing an ODBC driver.

praveenskumar
7 - Meteor

Hi @HenrietteH,

 

Thanks for sharing this useful information. I was trying to set it up at my end but I'm getting the below error message while setting it up, can you please help me with the same.

 

Error Message:

FAILED!

[Simba][ThriftExtension] (14) Unexpected response from server during a HTTP connection: SSL_connect: certificate verify failed.

 

praveenskumar_1-1654166294598.png

 

 

 

 

 

Configuration:

 

praveenskumar_2-1654166581254.png

 

praveenskumar_3-1654166620668.png

 

HTTP Option: filled with relevant details

 

 

Thank You