Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

How can I make my module run faster?

JordanB
Alteryx
Alteryx
Created

This question is one of the most commonly asked questions from clients I interact with and some treat it as a personal challenge to see how fast they can get their module to run. However, in the famous words of our Co-Founder Ned Harding "Although Alteryx can be very fast, since it is such a general tool, it is only as good as the module...".

 

There are multiple strategies to improving the speed of a workflow from using a Select Tool to reduce field sizes or looking at the default sort join memory, however, the first fundamental process is benchmarking.

 

Benchmarking

  • The recommended process is to run the workflow three times to ensure the data has cached.
  • If the workflow has not cached the data then this can cause slower run times, so to ensure this is a fair test running the workflow three times should ensure all the data is cached.
  • If you want to run without cached data you will have to reboot your machine between runs.
  • Once you have the total time the workflow takes to complete/run you can now look at optimizing.

 

Optimizing your Workflow

Sort/Join Memory

Alteryx is designed to use all of the resources it possibly can. In order to make Alteryx run as fast as possible, it tries to balance the use of as much CPU, memory, and disk I/O as possible.

  • Set your Dedicated Sort/Join Memory Usage lower or higher on a per-Workflow basis depending on the use of your computer.
  • Sort work refers to the Sort tool and other similar tools in re-ordering your data. Join work refers to any of the Join processes.
  • If you are doing memory intensive non-sort work (i .e . large drive-times) then lower it!
  • If you are doing memory intensive sort-work then increase it.
  • Go to the Workflow - Configuration > Runtime tab > Dedicated Sort/Join Memory Usage > Use Specific Amount
  • The Sort/Join memory setting is not a maximum memory usage setting, it's more like a minimum; this allocated memory will be split between all the tools that sort in your workflow, but other tools will still use memory outside that sort/join block. Some of them (e.g. drive times with a long maximum time) can use a lot.

Where do I find the Sort/Join memory options?

  • To set a user-level default dedicated Sort/Join Memory Usage, go to Options > User Settings > Edit User Settings > Defaults tab.
  • The global Default Dedicated Sort/Join Memory Usage at System level can be found at Alteryx > Options > Advanced Options > System Settings > Engine > Default sort/join memory usage (MB).

 

Lean for more speed!

 

Select tool

  • A best practice to optimize the performance of your workflows is to remove data that won’t be needed for downstream processing as quickly as possible. You can always bring that data back into the workflow later if necessary.
  • The Select tool emoves fields or columns from your data. Other tools such as Join, Join Multiple, Spatial Match, Find Nearest, and to a certain degree Transform tools and Reporting tools have some select functionality that you can utilize within the tool to reduce the need to add additional select tools.

Filter tool

  • Another good way to optimize workflow performance is by using the Filter tool to remove unnecessary data.
  • The filter tool queries records in your file that meet specified criteria and identifies these records in your data, such as ZIP = 01001 . You may choose to handle records that come from the True output differently than the false output by connecting additional tools to the workflow on either side. This will allow smaller amounts of data to be passed downstream.

Auto Field Tool

  • Optimize your workflow for speed by setting the field type to the most efficient type and smallest possible size.
  • Use the Auto Field tool right after your Input Data tool to assign the most efficient type and size to your fields.
  • Another benefit of using the auto field tool is that it will reduce the size of your output file.

Enable Performance Profiling

  • This option will allow you to see a millisecond and percentage breakdown per tool in your workflow.
  • Having this breakdown will allow you to pinpoint the slower tools/processes in your workflow and use the methods suggested in this article to improve that tool/process.
  • Performance profiling can be found Workflow > Runtime > Enable Performance Profiling.

Disable All Browse tools

  • The Browse tool quickly becomes a data artisan's best friend, it allows users to see/review the entire data at any given step in the workflow building process, however, each of these Browse tools creates a temporary .yxdb and writing these files do take some time and slow down the processing.
  • There is an option to simply disable them so they can be easily enabled. This setting can be found at Workflow - Configuration > Runtime > Disable All Browse Tools.

 

 

Set your limits: Record Limit for the Inputs

  • When developing your Workflow, there is no need to bring in all your data during testing.
  • Use the Record Limit option in the Properties for the Input to bring enough records for testing.
  • If you want to set limits for all input tools in your workflow, you can also do this under the Runtime tab under Workflow – Configuration.

 

Tool Containers

  • The Tool Container allows the user to organize a workflow better by combining tools in logical groups.
  • Tool Containers can be disabled to run only certain portions of the workflow, effectively bypassing tools for a quicker run.

Cache Data

  • Designer has the ability to cache data from relational databases through the Input Data tool or via right-clicking a tool and selecting Cache and Run Workflow (see here for more details)
  • When checked, data is stored in a .yxdb file on disk so that data sources are not hit repeatedly during workflow development.
  • Data can only be cached when running a workflow in a Designer session. The setting is ignored when the workflow is run in the scheduler, in the Gallery, or from the command line.

 

Connection Progress

  • The Connection progress is a great way to keep track of the number of records and the size of the data going from one tool to another. In addition to that, the thickness of the connection itself varies depending on the size of data passing through (great for troubleshooting).
  • The default setting for the Connection Progress is “Show Only When Running” however leaving this set as ‘Show” will allow you to investigate the size of the data at certain points permanently (Workflow - Configuration > Canvas > Connection progress).

 

If you want more detail on any of the points mentioned above make sure to check out the great Tips and Tricks articles from Margarita Wilshire, et al!

Tips & Tricks 2019

Tips & Tricks 2018

Tips & Tricks 2017

Tips & Tricks 2016

Tips & Tricks 2015

Tips & Tricks 2014

Comments
DultonM
11 - Bolide

Thank you @JordanB and everyone else who contributed to this article! I like the comprehensive nature of the Knowledge Base postings like this one!

 

I have one question for clarity. In the Benchmarking section you say "running the workflow three times should ensure all the data is cached". I often have workflows/apps that I leave open throughout the whole day and run multiple times but the source data files may change. In other words, the workflow configuration remains constant, but the data entering may be different. If "all the data is cached", is Alteryx not processing the latest data when the source file changes? Does the caching mechanism work differently for apps running on a Private Server? (Note: My source files do NOT have the "Cache Data" option in the input tool selected)

 

Again, thank you for the comprehensive and valuable article!

SophiaF
Alteryx
Alteryx

Hey @DultonM,

 

Alteryx will process the lastest data when the source file is changed. @JordanB's suggestion was focused more on best practices for benchmarking - you want to be comparing apples to apples. These best practices are for when you are attempting to optimize a workflow where the data remains the same, but you are looking to shorten the run time. You would want to first run the workflow three times to ensure the data is cached, make the change you wish to (hopefully) optimize your workflow, then run it three more times. You would then compare the time from the first "3rd" run to the second "3rd" run to ensure you are comparing fully cached data to fully cached data. That way you know that the workflow changes you made were fully responsible for the time decrease. As far as the server goes, caching and optimization would be the same type of process. Again, the optimization is to make sure that you are putting the "best" workflow together whether that is up on the server or on your local machine.

 

Cheers,

DultonM
11 - Bolide

Ah, that makes a lot of sense! Thank you @SophiaF for the clarification!

davidhenington
10 - Fireball

This is excellent, thanks guys! 

 

What would be even better is this same approach, but for the predictive tools. I find that they do not leverage my desktop resources very well.