This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Have anyone run Pyspark code on Databricks using Apache Spark code tool from Alteryx ?
I currently use Simba Spark driver and configured an ODBC connection to run SQL from Alteryx through an In-DB connection. But I want to also run Pyspark code on Databricks.
I explored Apache Spark Direct connection using Livy connection, but that seems to be only for Native Spark and is validated on Cloudera and Hortonworks but not on Databricks.
If anyone found a solution for this, please help. Thanks in Advance.
Thank you for your question.
Apache Spark Code tool is expected to work with Databricks (https://help.alteryx.com/current/DataSources/SparkDatabricks.htm).
Best,
Paul Noirel
Sr Customer Support Engineer, Alteryx
@PaulN, Thanks for the reply.
I tried Apache Spark Direct connection to Databricks. It gives an error ‘DBFS Path not specified’.
While configuring the direct connection, there is no field to give any DBFS path and to that matter all of the Hive tables that are created in our Databricks are using Azure Data lake instead of DBFS default storage (blob).
I used a simple code and I get this DBFS error :
df = spark.read.table(<schema>.<tablename>)
Regards,
Harsha.
If you set up an Apache Spark On Databricks In-Database connection, you can then load .csv or .avro from your Databricks environment and run Spark code on it.
This likely won't give you all the functionality you need, as you mentioned you are using Hive tables created in Azure Data Lake. If this is the case, I'd recommending emailing support@alteryx.com to look into the error you are getting using the Spark Direct Tool standalone.