Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Apache Spark on Databricks Error

morr-co
10 - Fireball

I have a workflow using the Apache Spark Code tool. I am able to successfully connect to the databricks cluster and stream data in. However, if the PySpark code runs longer than approximately 10 minutes, the job fails with the following message:

 

 

 

Databricks could not execute the statement. It ended with a lifecycle of 'RUNNING' and result state '', with the message: In run.

 

 

 

When I go to the databricks job link, there are no code errors and all it says is "Cancelled". The timeout on the connection is set to 140 minutes, but again, it typically fails after 10 minutes. Below are screenshots of the workflow configuration. Any help here is appreciated!

 

Screen Shot 2021-03-31 at 10.36.11 AM.png

Screen Shot 2021-03-31 at 10.35.40 AM.png

6 REPLIES 6
gtorres8
Alteryx Alumni (Retired)

Hi @morr-co,

This will require a more in-depth investigation to help troubleshoot the issue, please use the "Submit a Case" option to open a case with Customer Support for further assistance.

 

Enable the ODBC logs to Verbose within the Spark ODBC DSN and check out what the timeout settings are there in case that needs to be increased, but here are some tips to help expedite assistance when opening a case.

George Torres

Sr. Support Engineer
Alteryx, Inc.

PedrodeOl
9 - Comet

Hi @morr-co ,

 

I'm facing a quite similar problem with multiple streams going into the apache spark tool, are your using the alteryxread command to read the inputs? It has an index bigger than 1, like, alteryxread(1) and alteryxread(2)?

 

Could you share these lines?

linnc
5 - Atom

Has anyone found a solution here?

I am seeing the exact same issue. If I pass a small amount of data through that takes only a few minutes to process within Databricks, the workflow runs fine. As soon as Databricks is processing for 10 minutes, the Alteryx workflow fails with the same message that morr-co mentioned above. Strangely, I do notice within databricks the code is still processing and completes after 14 minutes. Trouble is Alteryx stopped the workflow and errored the macro out after 10 minutes. This happens regardless of what the timeout setting is set at. From what I can tell, the timeout setting is more for databricks. Alteryx appears to be doing its own weird error at 10 min.

apathetichell
18 - Pollux

@VojtechT- Is this one of yours? - if not - can you flag this to the product team - ie that the timeout on the Alteryx end be customizable vs the default 600 seconds?

linnc
5 - Atom

Hello, Just following up here. Any word on a potential solution?
apathetichell - it makes since why it keeps erroring out after 10 minutes if Alteryx has a set/default timeout of 600 seconds

apathetichell
18 - Pollux

This is an under the hood issue so it takes someone on the Alteryx Dev's team side to fix it.

Labels