Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Make In-DB Queries much more efficient (dont nest everything or use Select *!)

I reported this to the support team but was told it was by design and to post here.

 

In-DB Inefficient SQL

I would like to report that the In-DB tools are generating horribly inefficient SQL code for simple operations.  It seems no matter what tools you use every statement is starting with a nested 'Select * From'.

 

Example Simple workflow:

  Support1.jpgSupport2.jpg

 

This is a simple Select and Group by but the SQL Generated is:

 

SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit"

FROM (SELECT * FROM "_SYS_BIC"."shell.app.gsap.FL000_LSC.FL002_CTS.INT.RPT/CA_CTS_RPT_MAIN_001") AS "a"

GROUP BY "ShipTo", "ShipTo_Name"

 

This is taking a very long time to execute:

 

Statement 'SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit" FROM ...'

successfully executed in 15.752 seconds  (server processing time: 15.699 seconds)

 

Whereas if I take the same query and remove the nested Select *:

 

SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit"

FROM "_SYS_BIC"."shell.app.gsap.FL000_LSC.FL002_CTS.INT.RPT/CA_CTS_RPT_MAIN_001" AS "a"

GROUP BY "ShipTo", "ShipTo_Name"

 

It is very quick:

 

Statement 'SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit" FROM ...'

successfully executed in 1.211 seconds  (server processing time: 1.157 seconds)

 

So Alteryx is generating queries up to x13 slower than they should be thereby defeating the point of using In-DB.  As you can imagine in a workflow where we have multiple Connect In-DB tools this is a really substantial amount of time.  Example used above is from SAP HANA DB has 1.9m rows and ~90 columns but we have much bigger tables/views than this.

 

If you look you will see its same behaviour for all In-DB tools where each tool creates another nested Select with its particular operator.

 

MY SUGGESTION:

So my suggestion is that Alteryx should combine the SQL of the first few tools and avoid using SELECT * completely unless no Select tools have been used.  So it should combine:

- Connect In-DB + Select

- Connect In-DB + Filter

- Connect In-DB + Summarise

 

Preferably it should combine/flatten everything up until the first join or union.  But Select + Filter are a must!

 

Note it seems some DB's can cope OK with un-nesting these big nested queries in the query plans for some Tables but normally not for Views.  But some cannot cope at all and so the In-DB tools cannot even be used to Browse 100 records (due to select *).

18 Comments
carlabernethy
5 - Atom

Totally agree. X13 the speed really defeats the object of moving to In-DB tools.

adrianloong
11 - Bolide

+1

hsinom1
6 - Meteoroid

+1, please simplify the SQL generation. The purpose of Alteryx is to simplify the ETL process. The code needs to be optimized for a copy and paste into whatever ETL engine we use.

ChaseA
Alteryx
Alteryx

+1

Atabarezz
13 - Pulsar

+1 I've been there multiple times, In-db queries work so inefficiently... please fix it!

bora_perusic
Alteryx Alumni (Retired)

We are starting to review this idea and look for potential solution(s). We are desperately looking for real life examples of workflows (and generated queries) that you think are underperforming. This is not about us not believing :) , it is about collecting the requirements for the potential optimizer.

Anyone who would be willing to share the workflow and at least partly the underlying db metadata will be welcome. Can be done either publicly here or privately (just ping me, please)

 

Thank you for the help in advance

 

Bora

cbridge
7 - Meteor

Hi Bora - did you see the example I posted in the original post?  The issue is the initial Select *'s.  Is my suggestion not possible?  

 

Note that certain databases on some occasions (like SAP HANA) when they make the SQL Plan are removing the initial Select * and executing efficiently but if the View is complex this is not happening.  SQL Server does not seem to remove.

 

I think with the logic I suggested you can fix for all DB's/use cases.

bora_perusic
Alteryx Alumni (Retired)

Hi Chris,

yes, I have and I understand, I am just looking for more examples and more complex ones, where the solution is also more complex.

Thanks

 

Bora

stj1120
8 - Asteroid

Hi all,

 

I'm new to alteryx and I need to replicate the SQL Server Views in alteryx as a workflow. However, i'm using In-DB tools to connect to the server and I'm facing lots of problems when trying to replicate sub queries in alteryx. it is taking akmost > 5 minutes to run the workflow. 

 

The below is one sub query in the View table which i'm trying to accomplish in the Alteryx. But, I'm getting stuck as I'm not getting the desired output. The data is not passing to Union tool. Can anyone help me on how to create the right workflow? 

 

(SELECT NAME AS ToTDescription 

 FROM SELAS_XT.dbo.S_EVT_ACT_TOT 

 

WHERE (ROW_ID = 

                            (SELECT MAX(ROW_ID) AS Expr1 

                              FROM SELAS_XT.dbo.S_EVT_ACT_TOT AS S_EVT_ACT_TOT_5 

                             WHERE (OWNER_LOGIN = ToT.OWNER_LOGIN) AND (OWNER_PER_ID = ToT.OWNER_PER_ID) AND  

                             (OWNER_POSTN_ID = ToT.OWNER_POSTN_ID) AND (FC.CalDate BETWEEN CAST(APPT_START_DT 

                             AS DATE) AND CAST(APPT_END_DT AS DATE))))) AS TotDescription

 

Thanks in advance.

Capture.PNG

Best Regards,

Sreenivasa Teja.

Atabarezz
13 - Pulsar

What is the error message you get?