community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

Tips & Tricks 2016: Workflow Optimization

Alteryx Alumni (Retired)
Created on

The below is taken from the Tips & Tricks series presented at Inspire 2016. Special thanks to Margarita Wilshire and the Customer Support team for compiling these useful tips!



Resource Optimization 

 

  • Alteryx is designed to use all of the resources it possibly can. In order to make Alteryx run as fast as possible, it tries to balance the use of as much CPU, memory, and disk I/O as possible. The good news is that most of the resource utilization can be controlled. You can limit the amount of memory that is used on a system, user, or Workflow level.

  • The Sort/Join memory setting is not a maximum memory usage setting; it’s more like a minimum, this allocated memory will be split between all the tools that sort in your workflow, but other tools will still use memory outside that sort/join block, some of them (e.g. drive times with a long maximum time) can use a lot.

  • If a sorting can be done entirely in memory, it will go faster than if we have to fall back to temp files, so that’s why it’s good to set this higher. But if the total memory usage on the system pushes it into virtual memory, you’ll be swapping data to disk in a much less optimal way, and performance will be much worse and that’s why setting it too high is a bigger concern.

  • The global Default Dedicated Sort/Join Memory Usage at System level can be found at Alteryx > Options > Advanced Options > System Settings > Engine > Default sort/join memory usage (MB)

  • To set a user level default dedicated Sort/Join Memory Usage, go to Options > User Settings > Edit User Settings > Defaults tab

    1.PNG

Resource Optimization Best Practices

 

  1. Memory Settings

    32-bit machines:
    Setting should be on the lower, conservative side No matter how much actual RAM is there, only has at maximum 1 GB available, as soon as it is set higher, the machine will cross over into virtual memory and be unable to recover.
    A 32-bit machine should never have a setting over 1000MB, and 512 is a good setting. Set it low (128 MB), especially when using Adobe products simultaneously with Alteryx.
    Important Message on Alteryx Analytics Support for 32-Bit Windows Systems

    64-bit machines:
    Set this in the system settings to half your physical memory divided by the number of simultaneous processes you expect to run If you have 8 GB of RAM and run 2 processes at a time, your Sort/Join memory should be set to 2GB. You might set it lower if you expect to be doing a lot of memory intensive processes on the machine besides Alteryx.

  2. Set your Dedicated Sort/Join Memory Usage lower or higher on a per-Workflow basis depending on the use of your computer. If you're doing memory intensive non-sort work (i.e. large drive-times) then lower it; if you're doing memory intensive sort-work then set it higher.
    Configuration > Runtime tab > Dedicated Sort/Join Memory Usage > Use Specific Amount

    2.PNG

  3. Run Alteryx at a lower priority: This will ensure that the Alteryx Engine runs at a lower priority than all the other applications running on the same machine. By doing so, even the Alteryx GUI will remain responsive when you are running a large Workflow in the background. This is an especially good idea for a shared server.
    Alteryx > Options > Advanced Options > System Settings > Engine

  4. Shared Servers: For a shared server, the system owner/IT person should set the memory to no more than (total memory-2GB)/(Number of Users). This way if all the users are running Workflows at the same time the system won’t go into virtual memory, which really slows things down.

  5. Web Servers: When running Alteryx on a web server, you really want to set the memory to the minimum possible without impacting the performance too much. We recommend trying a system memory setting of 64MB and then increasing the memory on a per Workflow basis as needed. It is important to note that the user setting for memory usually has no impact since the web service typically runs as a separate system user. Make sure to use the system settings.

  6. Background Processing: Any time you are planning to run a Workflow in the background while you are going to continue doing other work, it is a good idea to run it with less memory.

  7. It is also a good idea to have the temporary directory point to a separate physical hard drive from your boot drive. If your temp drive points to C:\temp and you run a Workflow that consumes 100’s of GB of Temp space (it happens), your system may become unstable.

 

Lean for Speed

 

Select Data to be processed only with Select & Filter Tool

 

A best practice to optimize the performance of your workflows is to remove data that won’t be needed for downstream processing as quickly as possible, you can always bring later the additional data if needed. The Select tool removes fields or columns from your data. Other tools such as Join, Join Multiple, Spatial Match, Find Nearest, and to a certain degree Transform tools and Reporting tools have some Select functionality.

 

Useful tips when using the Select Tool:

  • Move highlighted field to top or bottom: Option > Move
  • To reorder multiple fields at once: Select, right-click and drag
  • Changed your mind? To revert to incoming field order: Options > Sort 

3.PNG

 

Another good way to optimize workflow performance is using the Filter tool to remove unnecessary data. The Filter tool queries records in your file that meet specified criteria and identifies these records in your data, such as ZIP = 01001. You may choose to handle records that come from the True output differently than the False output by connecting additional tools to the workflow on either side. This will allow smaller amounts of data being passed downstream.

 

4.PNG

 

Assign most efficient data types with the AutoField Tool

 

Optimize your workflow for speed by setting the field type to the smallest possible size and most efficient field type. String fields with a big size can be costly and carrying that through your workflow will slow it down. Use the AutoField tool right after your Input Data tool to assign the most efficient type and size to your fields.

Below the data types before and after the AutoField tool. 

 

5.PNG   6.PNG

 

Another benefit of using the AutoField tool is that it will reduce the size of your output file.

 

7.PNG

 

 

Speed up Processing


Disable All Browse tools


The Browse tool quickly becomes a data artisan’s best friend, it allows to see/review the entire data at any given step in the workflow building process, however, each of these browse tools creates a temporary yxdb and writing these files do take some time and slow down the processing. When the workflow is ready for production is better to remove them, there is an option to just disable them so they can be easily enabled if need it. This setting can be found at Workflow > Runtime > Disable All Browse Tools


8.PNG

 

 

Modify User Settings

In the User Settings > Advanced tab on how to improve performance.

 

User Settings -  Advanced Tab.jpg

 

1- Undo Levels.  You can undo or CTRL+Z by default 25 times.  In order to undo these many times data needs to be stored in memory.

You can decrease the Undo Levels if you need to save memory and improve performance.

 

2- Disable Auto Configure. This option will stop the metadata from being loaded every time you add a new tool while developing a workflow, thus press F5 to load the metadata only when needed.

 

3- Autosave interval in Minutes.  By default, the designer saves a version of the workflow every 10 minutes.  If for some reason you think you lost your work there is this very handy options to save your skin.  However, it can also make use of processing power when you do not expect.  You may want to increase the autosave interval and improve performance too.

 

4- Tool Results Settings. It is about that little anchor next to most tools that shows results just like a browse tool but with limited results.  Capture.JPG  

In this setting you can limit the memory size reserved to display the results, and save memory/performance.  Add a browse tool when you really need to see all results.

 

Performance Profiling

Capture.PNG

 

Have you ever wondered why exactly your workflow is taking so long? Is it the input or a join that seems to take forever? Performance profiling can answer those questions for you. It will tell you how long each tool took to process and how much of the overall processing time was allocated to that specific tool. Simply check the box in the Runtime tab under Workflow – Configuration and then analyze the Results - Workflow - Messages.

 

 

Comments
Meteoroid

Very usefull. Thank you

Asteroid

Thanks for sharing this.

Alteryx
Alteryx

One point about using the most efficient data types -- the comment "String fields with a big size can be costly", that's only if you are talking about fixed length strings, types String and WString. For the types V_String and V_WString your values take up no more space than is necessary (they have a couple of bytes extra to store the length, that's all). The specified size is just an upper limit, to make sure something crazy hasn't happened. For a fixed size string you use exactly the number of characters you specify, so that's where you would want to be careful. In the example shown, however, all the strings were variable sized. Where the example really saved space was changing things to integers.

 

Be careful not to let auto field change a zip code to an integer. "00234" is a perfectly fine zip code, you don't want it to turn into 234.

Is there a way to enter a code in the event section within the workflow, so when I configure to send an email after a successful flow, for it to add the date or time stamp in the email?

Alteryx Alumni (Retired)

email.jpg

 

There's a timestamp in the first line of Output Log, which is the last code there ( %OutputLog% ).

 

Otherwise, you could place an Email tool at the end of your workflow (it would only activate if there are no errors upstream). You can use the Date Time Now tool to get a timestamp for that option.

 Auto Field tool tip is great. Thank you!!

Alteryx
Alteryx

I would like to share a few more things in the User Settings > Advanced tab on how to improve performance.

 

User Settings -  Advanced Tab.jpg

 

1- Undo Levels.  You can undo or CTRL+Z by default 25 times.  In order to undo these many times data needs to be stored in memory.

You can decrease the Undo Levels if you need to save memory and improve performance.

 

2- Disable Auto Configure. This option will stop the metadata from being loaded every time you add a new tool while developing a workflow, thus press F5 to load the metadata only when needed.

 

3- Autosave interval in Minutes.  By default, the designer saves a version of the workflow every 10 minutes.  If for some reason you think you lost your work there is this very handy options to save your skin.  However, it can also make use of processing power when you do not expect.  You may want to increase the autosave interval and improve performance too.

 

4- Tool Results Settings. It is about that little anchor next to most tools that shows results just like a browse tool but with limited results.  Capture.JPG  

In this setting you can limit the memory size reserved to display the results, and save memory/performance.  Add a browse tool when you really need to see all results.

 

Have a great day!