Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

How To: Configure a Databricks Connection

HenrietteH
Alteryx
Alteryx
Created

How To: Configure a Databricks Connection

How to find the relevant information and configure a Databricks connection through the Manage In-DB Connections window. The Databricks connection includes bulk loading capabilities that allow users to load large data sets faster.

Prerequisites

  • Alteryx Designer
  • The Simba Spark ODBC Driver available on theAlteryx Driver Downloads page.
  • Databricks
    • Note that the screenshots were taken on AWS Databricks, but this is virtually the same for Azure Databricks

Procedure

If you have access to the Databricks console you'd like to connect to, you will be able to gather all of the necessary information to connect by yourself. If you do not have access to the console, you might need to ask your administrator for assistance.

  1. To configure the Spark ODBC Driver:Spark on Databricks - how to configure Simba Spark ODBC driver.png
  2. Setting up the In-DB connection in Alteryx:

    • The Read Tab:
      Capture.PNG
    • The Write Tab:

      Capture.PNG

Details on Bulk Loading

In Alteryx, use the Data Stream In tool to load data into Databricks. Select the connection you just created in Steps 1 and 2 above.

When you run the workflow, a temporary avro file will be created in the /FileStore/tables location in Databricks using the information provided on the Write tab in the connection. Using the information provided on the Read tab in the connection, a table will be created in Databricks and then the data will be moved from the temporary file to the table via a 'LOAD DATA INPATH' statement.

A successful run will contain the following messages (this example is for a temporary table in Alteryx):

Capture.PNG

Common Issues

User gets the following error when trying to write to Databricks via the bulk loading option (Data Stream In tool)

Data Stream In (5) Executing PreSQL: "LOAD DATA INPATH '/FileStore/tables/c8ba94243df14a778a9c/testtable1.avro' INTO TABLE testtable1;" : [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: Error running query: org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: /FileStore/tables/c8ba94243df14a778a9c/testtable1.avro;


Make sure the Databricks URL in the connection is configured as:https://Server Hostname exactly as it appears in the ODBC DSN.

Comments
Smerker
5 - Atom

Will this work with IAMRole auth to S3?

AlexAung
6 - Meteoroid

I am trying to run Databricks system query such as "show tables" . Although I have system access right, as the query is not a type of "Select" query, how do I run it in Atleryx? I tried using "In-DB connection" tool and it will not work. 

Is there any way to run such kind of system query in Alteryx Designer? Even if using R or Python tool, it is fine too. Please help.

TheCoffeeDude
11 - Bolide

I wish Alteryx could connect to Databricks without needing an ODBC driver.

praveenskumar
7 - Meteor

Hi @HenrietteH,

 

Thanks for sharing this useful information. I was trying to set it up at my end but I'm getting the below error message while setting it up, can you please help me with the same.

 

Error Message:

FAILED!

[Simba][ThriftExtension] (14) Unexpected response from server during a HTTP connection: SSL_connect: certificate verify failed.

 

praveenskumar_1-1654166294598.png

 

 

 

 

 

Configuration:

 

praveenskumar_2-1654166581254.png

 

praveenskumar_3-1654166620668.png

 

HTTP Option: filled with relevant details

 

 

Thank You

AlexAung
6 - Meteoroid

@praveenskumar  In the SSL Options window, uncheck the option "System Trust Store" and save it.

Then try to connect it again.

Your error showing is related to that certificate you are using. So without this option it could work.

Try it out and let us know if issue resolved.

 

veruzi
8 - Asteroid

Hi,

 

Getting this error message when trying to write to DBx:  Data Stream In (2) Error running PreSQL: "CREATE TABLE testing.alx_alteryx_test2¶(`a` INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' stored as TEXTFILE;¶LOAD DATA INPATH '/FileStore/tables/fa0035c19d0145108b7c/testing.alx_alteryx_test2.csv' INTO TABLE testing.alx_alteryx_test2;": [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Unsupported operation detected in the query plan. Only Delta data source is supported for table creation or data writing.¶Query plan:¶CreateTableCommand `hive_metastore`.`testing`.`alx_alteryx_test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false¶¶ at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:49)¶ at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:443)¶ at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)¶ at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:41)¶ at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperat

 

But when I run the same CREATE STATEMENT in DBx directly, the table gets created successfully.

 

Alteryx v2020.4 (won't be able to upgarde as of now)

 

Please advise!

 

Raised case # 00579197

 

Thanks,

Alex

HenrietteH
Alteryx
Alteryx

Hi @veruzi 

Based on the error message, it looks as if your Databricks instance is expecting Delta Tables to be created (https://docs.databricks.com/delta/index.html)

We started supporting those with 2022.1 through the InDB tools:

HenrietteH_0-1666876200269.png

 

veruzi
8 - Asteroid

Hi @HenrietteH ,

 

Thank you for the quick reply.

As of now we are stuck with Alteryx v2020.4 and we won't upgrade any time soon.

So to support writing to DBx, could we configure DBx to support a different table format? Or do we need a different ODBC driver?

Not clear to me if the error comes from DBx server or ODBC driver version.

Also, what intrigues me the most is that if I run the same CREATE statement directly in a DBx notebook, the table gets created successfully even though it is not Delta.

 

Can you advise?

 

Thanks again,

Alex

 

 

 

 

HenrietteH
Alteryx
Alteryx

@veruzi 

I went back to the documentation to refresh my memory.

The tables are created as Delta tables by default, but you cannot use the LOAD DATA INPATH with a Delta table. It expects a COPY INTO. That's the functionality we added for 22.1.

You wouldn't be able to change that behavior with a different driver. I'm not sure if you can change the table type on the Databricks side to make tables not be Delta Tables? This would only be a workaround, the recommended solution would be to use 22.1

 

kz
6 - Meteoroid

Did not work for me. I was using OAuth token. I have compiled the instructions and placing them here:

  1. Fill up the information in the box as highlighted and click the “OAuth Options” button. You might have to follow you organization standard for “Data Source Name” and enter a meaningful description for this connection  

 

kz_0-1675456852010.png

 

lepome
Alteryx Alumni (Retired)

@veruzi (and others who see that error message about ...ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ... Error running query: org.apache.spark.sql.AnalysisException: Unsupported operation detected in the query plan. Only Delta data source is supported for table creation or data writing.)

As Henriette stated, using Delta should resolve your issue.  Please read about how to do that using Designer or Server versions 2022.1 and later at the bottom of this page.

In at least one case, this error message, generated by Databricks and passed through the Alteryx interface, was triggered by something else.  That problem was actually due to a permissions issue wherein the user did not have permissions required to create a new table in the data warehouse they were attempting to use.  If you see a message such as the one above and using Delta does not resolve it for whatever reason, you might work with the administrator of your Databricks environment to be certain that your account is granted the necessary permissions to do what you are attempting.

pcomerford
5 - Atom

Hi Folks,

 

The read works fine, but for writes using the Bulk loader i get the below errors, any ideas?

 

Screenshot 2023-03-22 at 10.15.45.pngScreenshot 2023-03-22 at 10.13.56.png

maamek
7 - Meteor

I have the same issue as @praveenskumar . Any help? That solution did not work for me

TheCoffeeDude
11 - Bolide

The issue that @praveenskumar is encountering appears to be an error thrown by Databricks when it attempts but fails to verify the authenticity of the SSL certificate for the connection. This can happen for a number of reasons, such as the certificate being expired, revoked, or not trusted by Databricks.

 

To resolve this error, you can try the following:

  • Make sure that the certificate is valid and not expired.
  • Verify that the certificate is trusted by Databricks.
  • Disable SSL certificate verification in Databricks.
maamek
7 - Meteor

@TheCoffeeDude  where do I find this certificate?😅