community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

Clean up in-DB temporary tables even after workflow failure due to DB error.

Preface: I have only used the in-DB tools with Teradata so I am unsure if this applies to other supported databases.

 

When building a fairly sophisticated workflow using in-DB tools, sometimes the workflow may fail due to the underlying queries running up against CPU / Memory limits. This is most common when doing several joins back to back as Alteryx sends this as one big query with various nested sub queries. When working with datasets in the hundereds of millions and billions of records, this can be extremely taxing for the DB to run as one huge query. (It is possible to get arround this by using in-DB write out to a temporary table as an intermediate step in the workflow)

 

When a routine does hit a in-DB resource limit and the DB kills the query, it causes Alteryx to immediately fail the workflow run. Any "temporary" tables Alteryx creates are in reality perm tables that Alteryx usually just drops at the end of a successful run. If the run does not end successfully due to hitting a resource limit, these "Temporary" (perm) tables are not dropped. I only noticed this after building out a workflow and running up against a few resource limits, I then started getting database out of space errors. Upon looking into it, I found all the previously created "temporary" tables were still there and taking up many TBs of space.

 

My proposed solution is for Alteryx's in-DB tools to drop any "temporary" tables it has created when a run ends - regardless of if the entire module finished successfully. 

 

 

Thanks,

Ryan

4 Comments

Hi, I have experienced the same issues as what you have highlighted, using the in DB tools with a AWS Redshift database.

I too used the Write In DB tool to break up the queries, as without it a multiple in DB tool workflow just kept locking up. I am hitting tables with billions of rows in it.

 

The big problem I see is selecting the “Create temporary table” should do exactly that. It should push the data to the temp database, not create a permanent table. This is really opening up potential for big problems and clashes internally with the BI Ops / DBA team.

 

I think all ‘In DB’ tools should act like this as a default when linking multiple in DB tools together. As in, each step/tool should write the result to a temp table, instead of trying to form a massive CTE based query.

 

In my experience with using Redshift so far and even MSSQL, a lot of the time you get better performance when breaking up the queries and pushing the results into temp tables rather than writing a huge complex query. With Redshift in particular, it seems to be able to utilize the dist and sort keys better this way.

 

It would also be good to have the option to configure indexes or dist/sort keys as part of it.

Meteor

Totally agree.

 

This issue was responsible for taking down our database when it ran out of memory because of repeated issues of 'temp' tables being left behind.  I have raised this issue repeatedly with Alteryx and there seems to be no appetite to investigate or fix.  As you suggest it just needs to clear temp tables ALWAYS regardless of status of execution.  I also asked about real Global Temp tables being used instead of actual tables in personal schema but was told they are not going to do this.

 

😞

Meteor

FYI Issue also happening with SAP HANA In-DB tools.

Meteoroid

Same in Hive in-db