This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
If you are developing with Alteryx's In-Database tools you may want to share the workflows with users who do not have as much database experience. Below is an example workflow to create a macro that will allow your users to select which fields are brought out of the in database tools without interacting with the in database tools directly.
The premise of the process is that you can stream out a single record of your table, then leverage a List Box tool in select mode to de-select the fields your user does not want to pull. From there, you can use Field Summary and Summarize to create a select statement that will be used to update a Dynamic Input In-DB tool.
To run in your environment, update the Connect In-DB tool to a connection that exists on your machine and start with a select all query.
I am trying to set up a System alias, but the System option is grayed out or missing. How can I create a System alias?
Standard Connection - System connection is grayed out:
In-DB Connection - System connection is not an option:
Alteryx must be run with elevated privileges in order to be able to create a System alias connection within Alteryx. If you have certain permissions, you can run Alteryx as an administrator and then try and create it. Simply right-click on the Alteryx program (or on the Alteryx icon on your desktop) and choose 'Run as administrator' (see below). Otherwise, you will have to get administrative rights in order to add a System alias.
More information from the Help section:
Thanks for reading!
As users collaborate, the sharing of workflows that connect to databases can be helpful. The In-DB Connection File type allows for this. A database connection is saved as an .INDBC file so it can be packaged with a workflow.
To create an In-DB Connection File that can be used on your workflow, go to Options > Advanced Options > Alias Manager > In-DB Connections.
Select the Data Source from the dropdown (1), for our example we will select Oracle, then pick File for the connection type (2), this option will ask you to navigate to a folder where the INDBC file will be stored and also will ask you to provide a name for this file (3).
Then proceed to set up your connection as you would normally do by selecting the appropriate driver and setting up the connection string for Read & Write.
Now that your connection file is created, the next step is to connect to your database using this file. Bring a Connect In-DB into your canvas, click on the dropdown for the Connection name and select Open File Connections, browse to your file and select it.
The next thing you will see the list of tables.
Select the table and/or write your query in the SQL Editor and proceed to build your workflow.
Now that you are ready to share, package your workflow by going to Options > Export workflow, notice that the INDBC file is being packaged with the workflow.
At the receiving end, the database drivers/client, data sources for that database should be installed and set up. The INDBC file looks like below, the password is encrypted.
Note: Workflow and all images created in this article are from Alteryx version 10.1
To create a pseudo random sample of data using the In-Database (InDB) tools, this could be a potential solution.
(This example uses Microsoft SQL 2008)
A set of 10 records are used in the workflow and we want find a ‘random’ 50% of the records.
If we were to use the InDB Sample tool and choose to sample 50% of the records we’ll just get the first five records, not very random.
Instead, using the InDB Formula tool we can create a new field, in this case it’s being called RandomID, and use the MS SQL expression NEWID() which will generate a GUID (this expression will probably vary depending on which database is being used).
Now we can feed our data to the InDB Sample tool. Configure the InDB Sample tool to sort by the RandomID field and select the number or percentage of rows to return.
Running the workflow our results look like this, notice the BankerIDs that are returned as compared to the original data
The workflow would look like this:
(A workflow is not included with this article as it requires a specific database connection.)
Alteryx has the ability to create and query Pre and Post SQL statements. Using this feature is good if you need to create a temporary table, delete or update the table before running the select. With this, you also have the ability to run stored procedures (via exec stored_procedure;) before and after the actual query as well.
Hive ODBC can be slow when writing to tables. If you are looking for a faster option to write to Hive and want to create a new table or overwrite an existing table, use the IN-DB tools to output your data.
We are working diligently and intelligently on making connections to databases easier, more secure, and faster in Alteryx. Even when we can make the best possible experience for our users come to fruition, any user may still encounter some issues due to the use of third party drivers and/or the intricate communications between the database, driver, and Alteryx. To help to minimize those issues, Alteryx is tested with an abundance of "supported" databases and drivers that are proven to be compatible but, unfortunately, you just can't test every database and every driver out there. Our technical specifications page found here will show you those databases Alteryx supports, as well as the drivers we have tested and support with respect to each database. This does not mean Alteryx will not connect to a database or driver you do not see on this page (or links within the page), however, Alteryx can not support those databases and/or drivers if troubleshooting or setup is required.
With Alteryx, the power to blend, clean and perform advanced analytics on disparate data is as easy as dragging and dropping tools with the click of a mouse. The new release of Alteryx Designer 10.0 expanded our predictive analytics tools to include MB Affinity, Network Analysis, In-DB Linear and In-DB Logistic regression tools.
Alteryx also has the flexibility to add additional R packages not integrated with our robust collection of predictive tools. We have had a question arise from one of our clients asking where they can find exponential (non-linear) regression. The recommended the R package for this type of analysis is the “nlstools”.
For more information on this R package please find the below link for reference:
Also, please find the link for the Alteryx Gallery app below for reference on how to install additional R packages. If you need help with the process, please let us know, and we can get our R experts in Customer Support to assist with the setup. Just email us at email@example.com.
Client Services Representative
Apart from making standard database connections, which allow for data to be pulled from the database directly into alteryx, the designer also has In-Database functionality. The In-database tools allow for the workflow created to be converted into a specific technology’s SQL language and the workflow to be executed within the database and not locally on alteryx. This will allow for subsetting of datasets by joining or filtering before the data is pulled over a network into the alteryx designer, resulting in faster workflow run times amongst other benefits
Open alteryx designer and drag and drop the “Connect In-DB” tool (Located in the “In-Database” category) onto the canvas. Select the “Manage Connections…” option to establish a new In-DB connection
The ‘Manage In-DB Connections” window should appear. To configure this window:
Select the Data Source/Database type from the “Data Source” dropdown
Under “Connection Type,” select whether you will be setting this connection up on the user or system level. NOTE: Administrator access will probably be required for “System” level. Setting up as the default “User” connection is generally sufficient
Under “Connections,” select “New” to create a new connection, or select a previously created connection from the dropdown
Under “Connection Name,” enter a name you would like for this connection you are creating
Under “Driver,” select the method of connection you would like to set-up, ie, “ODBC”
Click the “Connection String” dropdown arrow. You can choose to use an existing connection or create a new connection by clicking the “New database connection…” option
Follow this step if you chose to create a new connection in the previous step. Please proceed to the following step if you chose an existing connection:
If setting up an ODBC connection, please see the “Connecting to an ODBC datasource” guide on how to configure this part of the ODBC connection starting at the “***Step” step.
If setting up an OleDB connection, please see the “Connecting to an OleDB datasource” guide on how to configure this part of the OleDB connection starting at the “***Step” step.
After either successfully choosing to use an existing connection or establishing a new connection, please click on the “Write” tab and select the same connection
Click “OK.” If you were able to successfully connect, the “Choose Table or Specify Query” window should appear.
Select the table you wish to connect to from the “Choose Table or Specify Query” Window
Upon successful connection, the connection will now be stored in the “Connect In-DB” tool’s “Connection Name” dropdown
You have successfully set-up and saved an In-Database connection!!!
-Ali Sayeed, Solutions Engineer
Suppose you're using the In-DB tools and at some point you want to sort your data... in database. You notice there isn't a Sort In-DB tool available. (NOTE: you're not alone... see http://community.alteryx.com/t5/Alteryx-Product-Ideas/In-Database-Sort-Tool/idi-p/13160). Do you need to stream your data out, sort your data and then stream your data back in to the database? You could. But you don't have to. You can get the same results using the Sample In-DB tool. (Help regarding how to configure the Sample In-DB tool is available here: http://help.alteryx.com/current/index.htm#LockInSample.htm?Highlight=in-db%20sort).
In the example below, we're connecting to a Teradata database and reading a retail customer file containing 1500 records but the process is database independent. The Sample In-DB configuration is the important piece to the solution. Select 'Percent' in the dropdown and sample 100%. Check the 'Sample records based on order' checkbox and select the field(s) want to sort and what order you want the data sorted (Asending/Descending).
If you decide to stream your data out as some point, you have the option of sorting your data in the Data Stream Out tool as well.
If you attach a Browse In-DB tool to your Sample In-DB, set the 'Browse first N records' to 0. This will display all records in the order you specified. At the time of this writing, any value other than 0 may not display results in the proper sort order (solution pending).
Attached workflow was created using Alteryx Designer v10.1. This workflow will need to be configured to your specific database environment and logon creditials to work properly.
Alteryx provides GUI tools that offer similar functionality to many SQL commands. Although minimal SQL scripting may be necessary in order to properly configure tools, the amount required to complete analysis is significantly reduced. In particular, the following comparison operators are frequently used in filter and formula tools:
is not equal to
greater than or equal to
less than or equal to
Additionally, the operators AND, OR, NOT, and IN are also used in the configuration of tools such as Filter and Formula. It is also good to note that Alteryx is capable of recognizing both single and double quotes, as long as the syntax is kept consistent, and that tool programming is not case-sensitive.
Input tables by default will show all available columns. Columns passed downstream can easily be altered or added/deleted by using a Select tool, located in the Favorites Toolbox. This includes changing names and formats, such as V_String to Int, as well as changing the order in which they appear within the table.
'Where' conditions can be controlled using the Filter tool, located in the Favorites Toolbox. By selecting a column, the conventional SQL operators listed above can be used to set conditions to filter on. When the condition specified is met, the data passed downstream can be accessed from the T side of the Filter tool. It is nice to note that data that does not meet the condition is also accessible from the F side, if it is desired for analysis.
The Group By functionality can be accomplished using a Summarize Tool, located in the Favorites Toolbox. Any column may be selected, and grouped by within this tool. The Summarize Tool also offers a multitude of other functionality, such as counting and common numerical and spatial functions. When using this tool, it is important to remember that only the columns selected and configured within the tool will be passed downstream. If other fields are desired for analysis, they can be easily accessed using a Join Tool immediately downstream.
The Join Tool will default to produce an inner join. Fields can easily be removed, re-ordered, and renamed within the configuration of the Join Tool to ease analysis.
Cartesian join functionality can be easily accomplished using an Append Fields Tool. Cartesian joins yield a cross product of the two tables
LEFT JOIN (Also known as LEFT OUTER JOIN)
In SQL, a left join, also known as a left outer join, will return every record from the table on the left, regardless of whether it joins to a record from the table on the right. Records without a join will show the missing field as Null. To achieve this in Alteryx, the data that is output from the center of the Join Tool can be union with the data that is output from the left side of the Join Tool.
RIGHT JOIN (Also known as RIGHT OUTER JOIN)
A right join, also known as a right outer join, will return every record from the table on the right, regardless of whether it joins to a record from the table on the left. To achieve this in Alteryx, the data that is output from the center of the Join Tool can be union with the data that is output from the right side of the Join Tool.
FULL OUTER JOIN
A full outer join will return the conceptual result of a right and left join. All records from both tables will be returned regardless of whether or not a join occurs. In cases where a join does not occur, Null values will be returned.
Order by functionality can be accomplished using a Sort Tool. Any field can be selected and sorted using this tool, in either ascending or descending order.
A SQL SUM command will yield a numeric total for the column selected. This can be replicated in Alteryx with a Summarize Tool, where any column can be summed by choosing the Sum option from the Numeric drop-down menu.
A SQL MIN command will return the lowest value present in the column selected. This functionality can be duplicated in Alteryx by choosing the Min option from the Summarize Tool.
A SQL MAX command will return the highest value present in the column selected. This functionality can be duplicated in Alteryx by choosing the Max option from the Summarize Tool.
A SQL AVG command selects the average value for a certain table column. This can also be replicated using the Summarize Tool, by choosing the Avg option from the Numeric drop-down menu.
A SQL DISTINCT command can be combined with a SELECT statement to return an output with unique values for a certain database table column. This can easily be accomplished in Alteryx by using a Unique Tool.
Establishing Connections Is the driver support the same for In-Database and standard Alteryx workflows?
With In-Database, driver support has been simplified in order to optimize for speed:
For 32-bit Alteryx, 32-bit drivers for SQL Server and Oracle are supported with In-Database.
For 64-bit Alteryx, 64-bit drivers for SQL Server and Oracle are supported with In-database.
32-bit drivers are not supported for 64-bit Alteryx.
What permissions are required to initiate an In-Database workflow using the Connect In-DB tool? Read privileges are required to access the underlying database. What permissions are required to output data to the database using the Write In-DB tool? Write privileges are required to create a table in the database. What permissions are required to stream data into an In-Database workflow using the Data Stream In tool?
Either read-only or write privileges are sufficient when using SQL Server.
Write privileges to the GLOBAL TEMPSPACE directory are required when using Oracle.
What happens to the temporary tables created by the Data Stream In tool? The temporary tables are deleted at the end of the run. If Alteryx crashes while the Data Stream In tool is being run, then the next time that an In-DB module, all temp files created by Alteryx in the db in the previous three days are cleaned out. Are the underlying ‘rules’ (e.g., database time outs) maintained during an In-DB process, or does Alteryx modify or circumvent those? The underlying ‘rules’ are maintained during the process the same as with the DB connections via the standard Input and Output Data tools. If there is a database timeout or if there is a limit to the number of queries per day that the user can run, it will affect the user’s connection to the database.
SQL Statement Generation
How is the SQL statement created? A SELECT statement is triggered by the Connect In-DB tool and additional queries are created by downstream tools and nested within this query. The addition of one of the following three tools completes the query and sends it to the underlying database: Write In-DB, Data Stream Out, Browse In-DB.
A user can input his or her own SQL statement in the Query box for the Connect In-DB tool, which also gets embedded within the SELECT statement.
When is the SQL statement processed? The SQL query for the underlying database is triggered at runtime for each Browse, Data Stream Out or Write Data tool. How is Browse Caching Enabled? The Browse Data In-DB tool can be configured to cache the data as a .yxdb file when the workflow is run. Once the data cached, if the workflow is re-run and the database connection or query (including the number of records to browse) has not changed, the query will not be re-run. Instead, the data will be pulled from the cache.
An output message indicates whether or not the data was cached. Clicking the link will open the data results as a .yxdb file in a separate window. What are the differences between caching in the Input Data tool and caching in an In-DB workflow? Using the “Cache data” option in the Input Data tool prevents redundant data transfer from a database to Alteryx, which can significantly reduce the time it takes to run a workflow. Using the “Enable caching” option in the Browse Data In-DB tool prevents the workflow from sending redundant queries to the underlying database. The “Enable caching” option is on by default. When is the In-DB cache used? The In-DB cache is used any time a workflow is re-run without changes to upstream tools. Making a change to any upstream tool will trigger a new query and a new cache will be created. Does the “Browse first  records” option impact the amount of data processed by the workflow? No, the “Browse first  records” option only limits the number of records displayed in the Browse. Other tools in the workflow will process the number of records that pass through at any given point For more information, please see the In-Database Overview in the Alteryx Help. Huge thanks to Maureen Wolfson for the suggestion and passing this along, until next time! Chad Follow me on Twitter! @AlteryxChad
Question Have you ever wondered what query was being sent to your database at a certain point in your workflow?
Answer You can view this using a Dynamic Output In-DB tool! All you have to do is place a Dynamic Output tool after any tool in your In-DB process in order to see the query that is being sent to the database. Once the tool is placed in the desired spot in your workflow, make sure that you have the "Query" option selected in the Dynamic Output tool's configuration:
You can now run the workflow and since the output tab of the Dynamic Output tool is a normal connection tab, you will be able to view the query using the browse everywhere feature and the results window. If you are interested in viewing the entire query and/or copying and pasting it you will have to add a browse tool after wards so you can access the query string in the Cell Viewer:
A "fun fact" with this process is that you can take the query that is produced by the Dynamic Output tool and paste it into the SQL Editor of a normal Input tool to produce the same results:
User versus System DSN
When setting up your database connection you may be unsure whether selecting a user or system DSN is the best option.
The main question you need to ask yourself is whether anyone else within your organization will need to see the same database connection? If the answer is no then a User DSN would be the correct choice, however if you are looking to share a workflow then the system DSN would be a better choice.
A User DSN is for a specific user on a workstation. For example, if I log into my machine as ‘Jordan’, set up the DSN as myself and then log back in I will be able to see that database connection. However, if my colleague ‘Tony’ logs onto my machine he will not be able to see that database connection. For Tony to see this connection I would need to set up a system DSN, so no matter who logs into my machine they will be able to see and access that database connection.
A system DSN can often be advantageous when sharing workflows within a department because if someone makes that database connection it can be shared amongst colleagues. If this creates concern for your IT department it is really easy to set up an Alias for this connection and as a result the username and password for that connection can be hidden.
The user and system DSN can be configured by using the two tabs marked below in the ODBC admin.
Client Services Support Engineer