Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Make In-DB Queries much more efficient (dont nest everything or use Select *!)

I reported this to the support team but was told it was by design and to post here.

 

In-DB Inefficient SQL

I would like to report that the In-DB tools are generating horribly inefficient SQL code for simple operations.  It seems no matter what tools you use every statement is starting with a nested 'Select * From'.

 

Example Simple workflow:

  Support1.jpgSupport2.jpg

 

This is a simple Select and Group by but the SQL Generated is:

 

SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit"

FROM (SELECT * FROM "_SYS_BIC"."shell.app.gsap.FL000_LSC.FL002_CTS.INT.RPT/CA_CTS_RPT_MAIN_001") AS "a"

GROUP BY "ShipTo", "ShipTo_Name"

 

This is taking a very long time to execute:

 

Statement 'SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit" FROM ...'

successfully executed in 15.752 seconds  (server processing time: 15.699 seconds)

 

Whereas if I take the same query and remove the nested Select *:

 

SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit"

FROM "_SYS_BIC"."shell.app.gsap.FL000_LSC.FL002_CTS.INT.RPT/CA_CTS_RPT_MAIN_001" AS "a"

GROUP BY "ShipTo", "ShipTo_Name"

 

It is very quick:

 

Statement 'SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit" FROM ...'

successfully executed in 1.211 seconds  (server processing time: 1.157 seconds)

 

So Alteryx is generating queries up to x13 slower than they should be thereby defeating the point of using In-DB.  As you can imagine in a workflow where we have multiple Connect In-DB tools this is a really substantial amount of time.  Example used above is from SAP HANA DB has 1.9m rows and ~90 columns but we have much bigger tables/views than this.

 

If you look you will see its same behaviour for all In-DB tools where each tool creates another nested Select with its particular operator.

 

MY SUGGESTION:

So my suggestion is that Alteryx should combine the SQL of the first few tools and avoid using SELECT * completely unless no Select tools have been used.  So it should combine:

- Connect In-DB + Select

- Connect In-DB + Filter

- Connect In-DB + Summarise

 

Preferably it should combine/flatten everything up until the first join or union.  But Select + Filter are a must!

 

Note it seems some DB's can cope OK with un-nesting these big nested queries in the query plans for some Tables but normally not for Views.  But some cannot cope at all and so the In-DB tools cannot even be used to Browse 100 records (due to select *).

18 Comments
Atabarezz
13 - Pulsar

Quick diagnostics;

 

1) Are you sure for the two inputs to the union tool, the columns have the same datatypes? This is the most common issue in unions...

 

2) Also you can union and join in-db, why don't you do that any specific reason?

 

3) For handling slowness why don't you filter at the very early stages of your workflow?

stj1120
8 - Asteroid

The data is not flowing into the union tool. Also, I'm confused where to use the filter ROW_ID. Kindly please help me out. 

 

Thanks in advance. 

stj1120
8 - Asteroid

The data is not flowing into the union tool. The data type is different for the two columns passing into the union tool. I'm little confused where to use the filter tool and summarize tool for ROW_ID. 

Just because of that,I'm getting incorrect output. Also I'm confused about whether I need to use self join or some other. 

Kindly please help me out. Many thanks!

 

Best Regards,

Sreenivasa Teja.

AlteryxCommunityTeam
Alteryx Community Team
Alteryx Community Team
Status changed to: Accepting Votes
 
harsh_alang
Alteryx
Alteryx
Status changed to: Under Review

Engine team will review and add to future roadmap.

simonaubert_bd
13 - Pulsar

Hello @harsh_alang 

Can we have some news about this topic? Just like all the in-db ideas, it doesn't seem to be taken seriously by Alteryx, despite popularity.

Best regards,

Simon

harsh_alang
Alteryx
Alteryx

Hello Simon,

 

This request has been added to the 2024 roadmap. We hope to deliver this in the first half of the year.

 

thanks

Harsh

simonaubert_bd
13 - Pulsar

Hello @harsh_alang  Nice, I hope other in-db ideas will be added on the roadmap as well.