Hi,
I'm new in Alteryx and looking for a way to do a hierarchical query for a huge amount of data. (PLM Database)
After tying various thing by using front-end technics (which are all slow), I tried with IN-DB Functions.
My current Process is as followed.
As you see I used two Database Tables and join them together, Afterword I would like to run an Apache Spark on top of the result which should do the hierarchical query. However I'm not able to get Apache Spark running.
The Database in Background is Oracle. My error message is always
My code is:
Does anybody have experience with Apache Spark + Oracle and can share a running process or give me a hint why isn't working.
If you have an alternative approach to the hierarchical query problem, it is welcome too.
Thank you,
Tom
Solved! Go to Solution.
I am pretty sure (like high 90s) you can't do what you are trying to do. Does your Oracle PLM database even natively support Spark? That tool is really designed for a Databricks (or Azure or maybe EMR/Dataproc connection)
A bit more of an explanation --- to use Spark you have to connect to a Spark cluster - this can be self hosted/provisioned, it can be EMR - or it can be Databricks - but it's a Spark cluster. Your spark cluster may be able to connect to your Oracle DB via JDBC. You may be able to exectue Spark commands on it via Alteryx's Spark tool. To connect to an Oracle DB on it - you would handle that on the Cluster. You then might be able to execute Spark commands on it. You're In-DB would connect to the Spark Cluster - not the Oracle DB --- and you'd obivously have to pay all of the upkeep/maintenance/data transfer costs for and between your Spark cluster and your Oracle DB.