I have a very large job where everything is in database (Databricks). At the end of the job, I am attempting to write to a table but it fails every time with a syntax/semantics error. I have used a dynamic output tool right before the write in database tool to output the query. The query is roughly 40,000 lines, it runs fine when I run it in Databricks directly. The issue seems to be the write in database tool itself, I think it might be truncating the query causing the syntax error. Is this possible? If so, what are alternative solutions?
It's possible but more likely:
1) are you using write in-db? I have NEVER gotten write-in-db to work with Databricks --- only Datastream-in...
2) can you try to write a smaller amount?
3) Cluster timeout ---> this happens. a lot. extend your cluster timeout limit for your databricks account/token.
4) Databricks SQL issues ---> there could be something in Alteryx that Databricks sql can't handle or vice versa. Turn on logging in your ODBC driver --- not Alteryx. Examine your ODBC driver logs.
@caseyz wrote:I have a very large job where everything is in database (Databricks). At the end of the job, I am attempting to write to a table but it fails every time with a syntax/semantics error. I have used a dynamic output tool right before the write in database tool to output the query. The query is roughly 40,000 lines, it runs fine when I run it in Databricks directly. The issue seems to be the write in database tool itself, I think it might be truncating the query causing the syntax error. Is this possible? If so, what are alternative solutions?
Hello,
Yes, the write-in-database tool may be truncating your large query, leading to syntax errors. This is a known issue with some tools that have query size limits. As an alternative, consider:
- Writing the query directly in Databricks using notebooks or SQL endpoints
- Breaking the query into smaller chunks or using stored procedures
- Using external tools like Snaplogic or custom Python scripts with JDBC for better control
These approaches help bypass tool limitations and ensure full query execution.
Best Regards,
Thomas Henry
If your job is that big, is there any way in Databricks to store it as a stored procedure or something that you can just call? 40K lines is a lot of code. I agree with the other suggestions of can you break it up.
40 000 lines of SQL ? Can you please explain why it is so huge?
@simonaubert_bd --- thats from output in-db of the query which Alteryx is using. If you throw a bunch of In-DB tools-> and have a slew of fields in each tool ---> you can get there pretty fast.
@apathetichellThe (I hope good) practice I have on in-db to avoid that is to use temporary tables. It costs a little storage but it can help for performance (or not) and make the queries simpler.
edit : according to the OP, he uses a dynamic input tool, meaning he somehow made the code that huge, not the in-db nesting logic .
I'm sure there is a reason why, but I can say that can't be fun to debug 😀 Temp table approach to break things into pieces took a co-workers job from nearly 2 hours down to about 25 minutes. I think that is a good path to attempt for sure @caseyz
@caseyz --- did you get this to work?
@dataguyW @simonaubert_bd -- I do not believe Databricks supports temp tables. It has temporary views --- but I don't think Alteryx supports them (not sure about the underlying driver).
Back to my original points ---> 1) use datastream in vs write in-db.
@caseyz where are we at here? did this work?
would love to understand if write-in-db is even working for databricks right now... my memory is databricks must use datastream-in.--- and this makes it look like it still requires datastream-in.
https://help.alteryx.com/current/en/designer/data-sources/databricks.html#idp411866
User | Count |
---|---|
107 | |
82 | |
70 | |
54 | |
40 |