Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

DataStream and Apache Spark

MHP_Guy
6 - Meteoroid

Hi, 

 

I'm new in Alteryx and looking for a way to do a hierarchical query for a huge amount of data. (PLM Database) 

After tying various thing by using front-end technics (which are all slow), I tried with IN-DB Functions. 

 

My current Process is as followed. 

 

 

image.png

As you see I used two Database Tables and join them together, Afterword I would like to run an Apache Spark on top of the result which should do the hierarchical query. However I'm not able to get Apache Spark running. 

 

The Database in Background is Oracle. My error message is always 

 

image.png

 


My code is: image.png

 

Does anybody have experience with Apache Spark + Oracle and can share a running process or give me a hint why isn't working. 

 

If you have an alternative approach to the hierarchical query problem, it is welcome too. 

 

Thank you,

Tom

2 REPLIES 2
apathetichell
19 - Altair

I am pretty sure (like high 90s)  you can't do what you are trying to do. Does your Oracle PLM database even natively support Spark? That tool is really designed for a Databricks (or Azure or maybe EMR/Dataproc connection) 

 

 

apathetichell
19 - Altair

A bit more of an explanation --- to use Spark you have to connect to a Spark cluster - this can be self hosted/provisioned, it can be EMR - or it can be Databricks - but it's a Spark cluster. Your spark cluster may be able to connect to your Oracle DB via JDBC. You may be able to exectue Spark commands on it via Alteryx's Spark tool. To connect to an Oracle DB on it - you would handle that on the Cluster. You then might be able to execute Spark commands on it. You're In-DB would connect to the Spark Cluster - not the Oracle DB --- and you'd obivously have to pay all of the upkeep/maintenance/data transfer costs for and between your Spark cluster and your Oracle DB.

Labels
Top Solution Authors