This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 05-16-201907:31 AM- edited
2 weeks ago
How To: Configure a Databricks Connection
How to find the relevant information and configure a Databricks connection through the Manage In-DB Connections window. The Databricks connection includes bulk loading capabilities that allow users to load large data sets faster.
Note that the screenshots were taken on AWS Databricks, but this is virtually the same for Azure Databricks
If you have access to the Databricks console you'd like to connect to, you will be able to gather all of the necessary information to connect by yourself. If you do not have access to the console, you might need to ask your administrator for assistance.
To configure the Spark ODBC Driver:
Setting up the In-DB connection in Alteryx:
The Read Tab:
The Write Tab:
Details on Bulk Loading
In Alteryx, use the Data Stream In tool to load data into Databricks. Select the connection you just created in Steps 1 and 2 above.
When you run the workflow, a temporary avro file will be created in the /FileStore/tables location in Databricks using the information provided on the Write tab in the connection. Using the information provided on the Read tab in the connection, a table will be created in Databricks and then the data will be moved from the temporary file to the table via a 'LOAD DATA INPATH' statement.
A successful run will contain the following messages (this example is for a temporary table in Alteryx):