Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Running Pyspark code on Databricks using Apache Spark code tool from Alteryx

Have anyone run Pyspark code on Databricks using Apache Spark code tool from Alteryx ?


I currently use Simba Spark driver and configured an ODBC connection to run SQL from Alteryx through an In-DB connection. But I want to also run Pyspark code on Databricks.


I explored Apache Spark Direct connection using Livy connection, but that seems to be only for Native Spark and is validated on Cloudera and Hortonworks but not on Databricks. 


If anyone found a solution for this, please help. Thanks in Advance.

Alteryx Alumni (Retired)

Hi @harsha_kapaganty,


Thank you for your question.


Apache Spark Code tool is expected to work with Databricks (




Paul Noirel

Sr Customer Support Engineer, Alteryx

@PaulN, Thanks for the reply. 


I tried Apache Spark Direct connection to Databricks. It gives an error ‘DBFS Path not specified’.
While configuring the direct connection, there is no field to give any DBFS path and to that matter all of the Hive tables that are created in our Databricks are using Azure Data lake instead of DBFS default storage (blob).


I used a simple code and I get this DBFS error :

df =<schema>.<tablename>)


dbfs error.jpg



If you set up an Apache Spark On Databricks In-Database connection, you can then load .csv or .avro from your Databricks environment and run Spark code on it.




This likely won't give you all the functionality you need, as you mentioned you are using Hive tables created in Azure Data Lake. If this is the case, I'd recommending emailing to look into the error you are getting using the Spark Direct Tool standalone.