Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

FAQ: How Do the In-Database tools Work?

DiganP
Alteryx Alumni (Retired)
Created

In-Database Tools FAQ


This article provides answers the most frequently asked questions related to In-Database Tools. 
 

Prerequisites

 
  • Alteryx Designer 
    • Version All
  • In-Database Tools (Overview)
 

How do the standard Alteryx tools work when processing data from a database?

 

  • When processing data from a database, a standard Input Data tool downloads all of the specified input data to a temporary file on the local machine. All subsequent processing in Alteryx occurs locally. This process can be slow when experiencing high network latency when connecting to large tables in a database. The large input dataset (wherever it resides) has to be brought over the network to your computer and then processed.
 

How do the In-Database (In-DB) tools work?

 
  • In-Database processing enables blending and analysis against large datasets without the data moving outside of the database. In-DB tools make a connection to your database, and everything is processed in the database rather than bringing the data over the network to your local computer. Data will only be streamed out of the database and on to your local machine if you use a Data Stream Out or Browse In-DB tool. This can provide significant performance improvements over traditional analysis methods that require data to be moved to a separate environment for processing.
  • The In-DB tools are designed to allow you to visually build a SQL query which is processed on the database. Alteryx-specific functions will not work for In-DB tools such as Filter In-DB or Formula In-DB.

 

indb1.png
 

When would I want to use In-DB tools instead of standard tools?

 
  • An example when you might want to use In-DB tools instead of standard tools would be if you have a huge table in Oracle, and you want to blend it with a small Excel file and then write a new table in Oracle. You can use the In-DB tools to stream the Excel data into a temporary table in your database and then use the Join In-DB tool. This would reduce the overall data movement over the network and optimize your run time. Of course, if your endpoint is a different database or an Excel file, you will need to stream the data out of Oracle regardless.
 

indb2.png

 

In what cases can you expect In-DB workflows to be slower than regular tools?

 
  • This really depends on the type of database your using and how fast it can process database queries. Some databases such as SQL Server are optimized to handle queries much faster and others can be slower. Workflows containing multiple joins and unions can increase the length and complexity of the query taking more time to process. In such instances bringing in data with normal input tools (over a low latency network) and processing in designer can be quicker especially with the introduction of the new AMP Engine. The following shows how Join/Unions increase the nested query.

The SQL statement from the first inner join

image.pngimage.png
 

The SQL statement with the introduction of another data source and a Union Tool increases the nested query

image.pngimage.png
 

What database permissions do i need to use the In-DB tools?

 
  • Read permissions are required to access the underlying database.
  • Write privileges are required to create a table in the database.
  • It is necessary to have CREATE permissions to be able to Stream in data to the database and to write data out to the database. The exception is Microsoft SQL Server.
  • To allow visibility of a temporary table across sessions, Alteryx must create a permanent table that is eventually deleted at the end of a workflow.
 

How does In-DB generate the database specific SQL queries?
​​​​

  • Database specific queries are mostly hardcoded for In-DB tools and for each supported data source. Each data source attempts to use generic functions were possible and the syntax changed for others to ensure the query run successfully. Lua Scripts can also used with In-DB tools to modify database specific functions that typically may not work with Alteryx out of the box. Such as mapping database specific datatypes with data types in Alteryx. The default location for Lua scripts can be found under: C:\Program Files\Alteryx\bin\RuntimeData\ODBC for admin installs and  C:\Users\user name\AppData\Local\Alteryx\bin\RuntimeData\ODBC for non-admin installs.
 

How do sessions and temp tables work with In-DB Tools?

 
  • In-DB tools have the added feature of caching sessions during a run. If multiple In-DB tools use the same connection (be it user, system or file), only one active session will be created for the workflow. Only read and write sessions are cached separately.  These sessions are only active when the workflow is run (specifically during engine run). Once the queries have run and the session is no longer required, Alteryx will check for any temporary tables created during the run, clear them and close the connection.
  • If a workflow fails or crashes during a run, this may fail clear temporary tables. Certain In-DB tools Database stream In/Out will perform an extra check for temp tables older than 3 days and automatically clear them.
 

Are there any Stored Procedures which are not supported?

 
  • Dynamic stored procedures are not supported due to a ODBC limitation. If stored procedure are dynamic in way where the result set cannot be retrieved without first running the stored procedure, then the metadata cannot be retrieved hence no field parameters to write to. In this instance a simple workaround would be to manually create the column set to allow the store procedure to run
  • Store procedures that don't return data are not supported.
 

Why does my workflow appear to be "stuck" on the data stream out tool?

 
  • In-database processing requires 64-bit Alteryx with 64-bit database drivers.
 

Additional Information:

 
  • In-database processing requires 64-bit Alteryx with 64-bit database drivers.
    • List of all data sources here.
  • You cannot cross databases with In-DB as there will be an architecture mismatch.
  • Spatial objects are currently not supported with In-Database tools.
  • The interactions between the ODBC In-DB Tools and ODBC Regular Input/Output tools are mostly the same. In-DB Tools have the added feature of caching the session to the database. 
  • There are six predictive tools that have in-database support. When a predictive tool with in-database support is placed on the canvas with another In-DB tool, the predictive tool automatically changes to the In-DB version. To change the version of the tool, right-click the tool, point to Choose Tool Version, and click a different version of the tool. You can find a complete list here.


Additional Resources

Comments
BorisTyukin
6 - Meteoroid

nice FAQ, I referenced it in my blog post as I was doing a quick evaluation of In-Database with Big Data engine Impala.

http://boristyukin.com/quick-evaluation-of-alteryx-in-database-tools/

fpinchon
8 - Asteroid

This is well written and helpful.

The part I still can't find is the underlying sql code for the Write In-DB tool... If you use the dynamic output tool afterwards, there is no detail of what action was performed in SQL...

If anyone knows...

 

Cheers,

Frederic

YEM
8 - Asteroid

@DiganP wrote:

In-Database tools FAQ

 

...

 

How do the In-Database (In-DB) tools work?

 

  • In-Database processing enables blending and analysis against large datasets without the data moving outside of the database. In-DB tools make a connection to your database, and everything is processed in the database rather than bringing the data over the network to your local computer. Data will only be streamed out of the database and on to your local machine if you use a Write Data In-DB, Data Stream Out or Browse In-DB tool. This can provide significant performance improvements over traditional analysis methods that require data to be moved to a separate environment for processing.

 


You mention that Write Data In-DB would stream the data out of the database into your local machine.  Is that correct?

The link here suggests the opposite: Write Data In-DB

 

If Write would stream the data out then back in, I would see no reason to ever use the In DB tools.

 

Appreciate any feedback you can give!

 

Thanks!

DiganP
Alteryx Alumni (Retired)

@YEM Good catch. What I meant to say is that the addition of one of the three tools (Write Data In-DBData Stream Out or Browse In-DB tool) completes the query and sends it to the underlying database. I'll fix this asap!

MK007
5 - Atom

Is it possible to use the  Write Data In-DB to write a dataset to "temp" table and then use some other IN-DB tool to execute a complex SQL statement that can use the "temp" table?

 

for visual reference this would be ideal.

2019-04-18 22_05_32ALXDST.png

I am aware of the In-DB join, however, this is a simplistic example. In reality I need to run a complex sql statement, which can perform a join as in the example. 

currentALXDST.png

The final output would be writing the blended data(claimStats.yxdb and InvoiceStats.yxdb) into a SQL table or generating some sort of report. I can't figure out how to make Container 37 dependant on Container 36 completing first. The "Block Until Done" doesn't appear to be able to link to the "Input Data" tool. 

 

fharper
12 - Quasar

@MK007 others can correct me if I am wrong, I am fairly new to exploring In-DB but you cannot cross databases with In-DB as there will be an architecture mismatch.  That said if it is a different table within the same DB then I think you can.  A lot depends on what exactly you want done. 

Container 36:

From what you show I think Container 36 is summarizing some "Customer" data and writing to a table.  If the table is in the same DB then the SQL in Container 37 could be done in In-DB tool form and joined.

 

Caveats/concerns:

  1. Hinges on the table written being in the same DB
    1. if customer data and invoice data are 2 tables in same DB then one query becomes like a sub-select or temp table or cte, I guess depending on the sql generator Alteryx uses.
    2. I suspect you don't need to write the temp table either.  One Connect In-DB for  but join after the
  2. Do you need to write out the yxdb's?  In-DB primarily saves time if the data stays in the DB engine, avoids writing out.
  3. If the object of the exercise is to get a report of invoices with customer details joined in then while it can be done in In-DB if you know SQL it is probably easier to just write the sql in ssms then copy to an input tool and write the result.  Essentially does the same thing equally efficiently since you will stream or write the result.  It is a reporting problem really... and not a cleansing or blending problem with update/insert and no data transfer out of the data engine.