Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Performance difference between Input Data & In-DB tools

vroque
8 - Asteroid

Good Evening,

 

I have an issue while querying a table (10 columns and about 30 mi records) with Alteryx. When exporting the data from this query using MS SQL Server Management Studio it takes about 10 seconds, but when I try it with Alteryx, using the Input Data Tool, the workflow takes about 30 minutes to run.

In order to improve our performance, I’ve tried to use the In-DB tools and the workflow took about 20 seconds.

Is there any known issue that may be causing such a difference?

 

This is the query I’m using on both Input Data and Connection In-DB:

 

SELECT *

FROM  DATABASE_NAME

WHERE CONVERT ( VARCHAR , [DT_COMPTC],102 ) + CONVERT ( VARCHAR , [DT_UPDT],102 ) IN (SELECT CONVERT ( VARCHAR , [DT_COMPTC],102 ) + CONVERT ( VARCHAR ,MAX([DT_UPDT]),102 ) FROM DATABASE_NAME GROUP BY [DT_COMPTC]) and DT_COMPTC BETWEEN '2017-07-01' AND '2018-01-14' AND CNPJ_FUNDO IN ('CNPJ_1','CNPJ_2')

 

Here’s some info:

 

Data Types:

DT_COMPTC: Date

DT_UPDT: Datetime

CNPJ_FUNDO: Varchar

 

Alteryx Version: 2018.2.4.47804

 

Ping between computer and SQL Server: bellow 1ms

5 REPLIES 5
Claje
14 - Magnetar

Hi,

 

Are you using the "Microsoft SQL Server" connection option with your regular input data tool?

If you are, can you try swapping that connection to the same type as your IN-DB connection?

vroque
8 - Asteroid

Yes, both connections are similar and both uses ODBC driver 13 for MS SQL 

Claje
14 - Magnetar

A very minor tweak would be to explicitly limit the query to your 10 columns - I don't know if there are any more columns in the table that you don't need, but you could be pulling in excess data with the Input Data tool that is being better handled via IN-DB.

 

Are you using any other IN-DB tools to limit your dataset, like an IN-DB Filter?  That could also improve performance in the IN-Database space, but the same functionality in regular Alteryx tools can take more time.

 

Unfortunately, SQL performance is tricky to debug remotely.  The other thing I would recommend trying if you can is changing the driver just to see if you observe any performance change on the Input Data tool.

TravisCraven
6 - Meteoroid

Victor it is important to understand the key difference of how both options process. Using the data In-out open will run on your machine's RAM. Note: Your performance may dramatically improve once your job has been published to Alteryx Server, if your server has much greater processing power than your desktop machine.

 

When you utilize In-Database connections, the reason you are seeing tremendous performance gains is because Alteryx is converting the objects to SQL and processing directly through the SQL Server database engine itself. That is why there is a limitation on the amount of transformation objects available for In-Database. 

 

***An important note I have discovered in testing Server publishing***

Local database connections will throw alias errors for In-out connections and In-Database connections. In order to avoid this issue it is recommended to create the SQL Server connections on the Gallery Admin, then build your flow using that shared connection. This will allow for connection sharing and proper publishing.

The exception to this approach is in In-Database connections. In order to bypass the alias issue we needed to create file connections and store those connection files locally until we were ready to publish. In both connection cases once the app has been published, anyone with access to the workflow can download and update the flow running the job through those published credentials. If you try to save the In-Database workflow locally, the gallery connection will be severed and a new file connection will need to be created in order to publish. 

 

Hopefully this detail helps answer your question and saves a bunch of time troubleshooting. In addition, I am hopeful that an Admin UI update will be coming soon to streamline this process. 

 

 

Thanks,

 

 

BretCarr
10 - Fireball

I was curious if anyone has experienced this still to date. I setup in-db connections and they worked fantastically at first but then out of nowhere, the process time went from 20 seconds to 20 minutes. I changed nothing in the environment but now all my database queries are near unusable but are fine in SSMS. Thoughts?

Labels