Hello Community,
I have been working with Databricks for nearly 7 years.
I am very interested with Alteryx.
Can someone provide scenarios where Databricks is used with Alteryx.
I appreciated both platforms are used to process ETL pipelines, but I was just wondering if there were any benefits in using Databrick within an Alteryx workflow?
Thanks
Carlton
Hi @carltonp,
Databricks is awesome and Alteryx complements it nicely! We have In-DB processing and bulk loading capabilities via Alteryx which basically means that someone without SQL or Python experience could benefit from a drag and drop workflow experience that harnesses the power of Databricks. Alteryx is also good for streaming data into Databricks to then join against massive data sets.
This is a nice walk through on how to connect using the In-DB tools: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/How-To-Configure-a-Databricks-Conne...
Official data source page: https://help.alteryx.com/20221/designer/databricks
Apache Spark on Databricks page: https://help.alteryx.com/20221/designer/apache-spark-databricks
The Alteryx partner page on Databricks website: https://www.databricks.com/partner/alteryx
If your organization has made big investments into Databricks then it probably makes sense to extend its functionality to the line of business and less technical folks who want to use its power with Alteryx. If you haven't already, please reach out to your company's Alteryx Account Executive and Sales Engineer if you would like a demonstration. They should be able to show you how these In-DB capabilities work and how they will benefit your organization.
I do this daily - It's kind of a big picture data engineering vs small picture data automation if that makes it clearer. Databricks is owned by your data engineering team - notebooks/PySpark/big picture pipelines. Alteryx is owned by your business owners (Finance/Accounting/FPA/etc). The end users and the developer are not the same. Could you do much of what Alteryx does on Databricks - 100% yes. Could you do it efficiently (based upon PySpark engineers cost and time to implement in Python/Scala etc) - no. Definitely not.