This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
First added to Alteryx Designer in version 10.6, the Optimization Tool is a member of the Prescriptive Tools (included with the Predictive Tools installation) and allows you to solve optimization problems. Mathematical Optimization is the selection of the best possible option(s), given a set of alternatives and a selection criterion. In this Tool Mastery, we will review the inputs, configuration, and outputs of the Optimization Tool.
The Auto Field Tool : a tool so easy you don’t have to do anything – just put it on your canvas and viola. Automatically optimized data types. If you’re running into data type related issues and errors in your workflows, or just looking to add some speed or reduce the occupied disk space your data is hoarding – look no further than the Preparation Tool Category ’s Auto Field Tool, which reads through all the records of an input and sets the field type to the smallest possible size relative to the data contained within the column.
I have had several questions from clients over the last few weeks looking to use Alteryx as the ultimate ‘middle man’ between databases and their end output, without explicitly writing anything to memory in the process. Alteryx will happily oblige in this domain whilst also providing seamless data blending & advanced analytics in the processes.
Here are some potential ways you can achieve this goal:
Using our In-Database tools means you never actually bring data down into memory but when you are ready you can use our Data Stream Out Tooland seamlessly pass this into our analytic tool set or to your output location.
No Browse Tools
In addition to the above point if you choose not to add Browse Tools, Alteryx will only cache around 1 MB of data at each tool in a temp location. This temp location is then cleared when you close Alteryx. Therefore, it is only kept in memory for the duration of the development of the workflow rather than indefinitely. Your default temp location can be found in Workflow Properties.
Changing workflows or Applications
When writing out data you need not write to hard-coded paths - you can reference “%temp% in the file path. This will then write to the default location set in workflow properties outlined above. You can then reference this file location in the next workflow by using a combination of the Directory, Sort, Sample, and the Dynamic Input Tools to read in that file. Alteryx will as default write an Alteryx Engine file for each run. Using the above tools will allow you to dynamically read in the latest file and data. The bonus is that these engine files get cleared out on a scheduled basis so the cached data will not exist in memory over time. The workflow depicting this (attached) was built in Alteryx 10.6.
Output to the database or via one of our connectors which use an API
You can utilize the above method mentioned in ‘No Browse Tools’ however at the end of your workflow you can output directly to a database using one of our connectors via an API. As Alteryx and Tableau work together, often clients will use the Publish to Tableau Server Macro to take data directly from an Alteryx workflow and up to tableau without keeping any data in memory. Again the 1MB of cached data will be removed from the Alteryx engine files and Alteryx pushes the data via a ‘POST’ command directly to the Tableau Server.
The key component of any batch macro , the Control Parameter Tool is the gear that keeps things moving. Using the input , the control parameter accepts a field of values that will be used within the batch macro to reconfigure and rerun the macro for each of the standard input records - unless using the GroupBy feature that matches certain control parameters to buckets of records to be batched through the macro together. Adding this interface tool to any macro will upgrade it to a batch macro and will give you the ability to loop through macro configurations for added customizability. While one of the more sophisticated solutions you can build into your workflows, there are few problems you can’t solve with a batch macro:
We love helping users be successful with Alteryx, and this means providing a ton of great resources for getting started, learning more, and keeping you up to date with all the amazing stuff we're doing here at Alteryx… and the most compelling is Predictive!
Check out the Predictive District on the Gallery. There are great macros, apps, and sample workflows to demonstrate some nifty new tools. This post by DrDan on the Analytics Blog gives an overview of what's currently available – stay tuned for additions!
One of my favorites is the Predictive Analytics Starter Kit Volume 1. It enables you to learn the fundamentals of key predictive models with an interactive guided experience. Examples include Linear Regression, Logistic Regression, and AB Testing, and demonstrates the steps necessary to develop the dataset needed for analysis, and then how to actually build these predictive models yourself.
With v10.6, we introduced the Prescriptive Tool Category, comprising the Optimization and Simulation tools, to assist with determining the best course of action or outcome for a particular situation or set of scenarios. The Engine Works Blog has an introduction to this toolset, plus an extensive use case demonstration.
If you need more Optimization and Simulation action, there are several sample workflows, including Fantasy Sports Lineups (hey, sports fans – blog post here!), a mixing problem, workforce scheduling, and more!
Speaking of use cases, the software itself contains a plethora of predictive sample workflows - and the installed Starter Kits show up here, too! Help > Sample Workflows > Predictive Analytics.
Of course, don't forget the Predictive Analytics help pages, for overviews and configuration tips.
Visit our Product Training page for On-Demand and Virtual webinars on everything Predictive – regression modelling, cluster analysis, time series… As always, please begin with Data Prep and Investigation! Can I mention the Field Summary Tool enough times?
Want to show off the interactive visualizations from the models you've built? This Knowledge Base post shows you how. Another Engine Works post outlines how to build your own Custom Interactive Visualizations (Part 1 and counting…)
For the most in-depth, resource-rich training on leveraging predictive analytics to answer your business questions, consider the Udacity Predictive Analytics for Business NanoDegree. It consists of seven courses focused on selecting the right methodology, data preparation, and data visualization as well as four courses that will equip you to use predictive analytics to answer your business problems.
But really, it all starts with the Community. Cruise the Knowledge Base posts, search for Predictive or other favorite keywords, follow the blogs… and for the love of Ned, just play with the software! It's how we learn :)
Question Can you wait X seconds between processing each row in Alteryx?
Yes! Thanks to Invisio for creating an Inviso Macro Pack and posting on their blog here.
The "Wait a Second" macro lets you wait X number of seconds before processing each row in the dataset.
One application is if you are contacting an API with multiple requests. The WaitAsecond macro may help to pause the API long enough to process multiple rows without issue.
It can also be used to scrape sites without putting heavy loads on their server. An Invisio sample of scraping the Alteryx community (See Insights to the Alteryx Community)
As you can see, the part of the flow that runs through the WaitASecond tool gets a NOW timestamps which are 5 seconds a part, whereas the bottom stream, not running through the WaitASecond tool, all gets the same timestamp.
There are essentially two macros:
The first one assigns a unique id to each record and then uses that ID for the batch macro.
The batch macro has a “Command tool” that runs a ping that waits x seconds before timeout (184.108.40.206 if that exist in your network it won’t work).
The macro can be downloaded here (InvisoMacros.zip).
This question is one of the most commonly asked questions from clients I interact with and some treat it as a personal challenge to see how fast they can get their module to run. However, in the famous words of our CTO Ned Harding "Although Alteryx can be very fast, since it is such a general tool, it is only as good as the module that you have authored".
There are multiple strategies to improving the speed of a workflow from using a select tool to reduce field sizes or looking at the default sort join memory, however the first fundamental process is benchmarking.
The recommended process is to run the workflow three times to ensure the data has cached.
If the workflow has not cached the data then this can cause slower run times, so to ensure this is a fair test running the workflow three times should ensure all the data is cached.
If you want to run without cached data you will have to reboot your machine between runs.
Once you have the total time the workflow takes to complete/run you can now look at optimizing.
Optimizing your workflow!
Alteryx is designed to use all of the resources it possibly can. In order to make Alteryx run as fast as possible, it tries to balance the use of as much CPU, memory, and disk I/O as possible.
Set your Dedicated Sort/Join Memory Usage lower or higher on a per-Workflow basis depending on the use of your computer.
Sort work refers to the sort tool and other similar tools in re-ordering your data. Join work refers to any of the join processes.
If you are doing memory intensive non-sort work (i .e . large drive-times) then lower it!
If you are doing memory intensive sort-work then higher it.
Go to the Workflow Configuration > Runtime tab > Dedicated Sort/Join Memory Usage > Use Specific Amount
The Sort/Join memory setting is not a maximum memory usage setting; its more like a minimum, this allocated memory will be split between all the tools that sort in your workflow, but other tools will still use memory outside that sort/join block. Some of them (e .g . drive times with a long maximum time) can use a lot.
Where do I find the Sort/Join memory options?
To set a user level default dedicated Sort/Join Memory Usage, go to Options > User Settings > Edit User Settings > Defaults tab.
The global Default Dedicated Sort/Join Memory Usage at System level can be found at Alteryx > Options > Advanced Options > System Settings > Engine > Default sort/join memory usage (MB).
*******For machine bit version memory considerations please see here.
Lean for more speed!
A best practice to optimize the performance of your workflows is to remove data that won’t be needed for downstream processing as quickly as possible. You can always bring that data back into the workflow later if necessary.
The select tool removes fields or columns from your data. Other tools such as join, join multiple, spatial match, find nearest, and to a certain degree Transform tools and Reporting tools have some select functionality that you can utilize within the tool to reduce the need to add additional select tools.
Another good way to optimize workflow performance is using the filter tool to remove unnecessary data.
The filter tool queries records in your file that meet specified criteria and identifies these records in your data, such as ZIP = 01001 . You may choose to handle records that come from the True output differently than the false output by connecting additional tools to the workflow on either side. This will allow smaller amounts of data to be passed downstream.
Auto Field Tool
Optimize your workflow for speed by setting the field type to the most efficient type and smallest possible size.
Use the auto field tool right after your Input Data tool to assign the most efficient type and size to your fields.
Another benefit of using the auto field tool is that it will reduce the size of your output file.
Enable Performance Profiling
This option will allow you to see a milliseconds and percentage breakdown per tool in your workflow.
Having this breakdown will allow you to pinpoint the slower tools/processes in your workflow and use the methods suggested in this article to improve that tool/process.
Performance profiling can be found Workflow > Runtime > Enable Performance Profiling.
Disable All Browse tools
The Browse tool quickly becomes a data artisans best friend, it allows to see/review the entire data at any given step in the workflow building process, however, each of these browse tools creates a temporary yxdb and writing these files do take some time and slow down the processing.
There is an option to simply disable them so they can be easily enabled if need it. This setting can be found at Workflow > Runtime > Disable All Browse Tools.
Set your limits: Record Limit for the Inputs
When developing your Workflow, there is no need to bring in all your data during testing.
Use the Record Limit option in the Properties for the Input to bring enough records for testing.
If you want to set limits for all input tools in your workflow, you can also do this under the Runtime tab under Workflow – Configuration.
The tool container allows the user to organize a workflow better by combining tools in logical groups.
Tool Containers can be disabled to run only certain portions of the workflow, effectively bypassing tools for a quicker run.
Designer now has the ability to cache data from relational databases through the input tool.
When checked, data is stored in an yxdb file on disk so that data sources are not hit repeatedly during workflow development.
Data can only be cached when running a workflow in an Alteryx Designer session. The setting is ignored when the workflow is run in the scheduler, in the gallery, or from the command line.
The Connection progress is a great way to keep track of the number of records and the size of the data going from one tool to another. In addition to that, the thickness of he connection itself varies depending on the size of data passing through (great for troubleshooting).
The default setting for the Connection Progress is “Show Only When Running” however leaving this set as ‘Show” will allow you to investigate the size of the data at certain points permanently (Properties for the Canvas > Connection progress).
If you want more detail on any of the points mentioned above make sure to check out the great Tips and Tricks articles from Margarita Wilshire et al!
Tips & Tricks 2016
Tips & Tricks 2015
Tips & Tricks 2014
Sometimes, especially when interacting with tools/files outside Alteryx, you need to explicitly control the order of events. The classic example is a process whereby you 1) INSERT to a SQL table where an auto-incrementing ID is applied 2) read the records back in order to get the auto-incremented id 3) insert the auto-incremented ID into a different table.
In Alteryx worflow streams that terminate (as in an Output or Render), you can't reliably use Block Until Done to control the stream. One way to reliably control the order of events is with batch macros.
Batch macros guarantee that 1) every iteration is complete before the next iteration begins, and 2) that all iterations are complete before the macro reports itself complete. So if you place a process inside a batch macro and create an output stream from that macro, all processing will be done and the macro still can exist in the middle a workflow stream. You don't even need a Block Until Done as part of the process.
Please see the attached example (authored in 10.0).
The Auto Field tool examines your data, and automatically optimizes the field type and length. Take a look at your data with a Select tool, follow this up with an Auto Field, and follow that up with another Select tool to see what kind of changes you’ve made. After you run the module, you can examine each Select tool to garner a before-and-after view of the adjustments made to the fields. You can even take this a step further, and add a few Browse tools to see how your database actually decreases in size, you may be surprised by how much! In the below view, the file size was reduced by about 40% with the Auto Field tool, on just 50,000 records and one field. Now consider running a file of millions of records and the amount of the size decrease becomes really substantial! Before Auto Field View: After Auto Field View:
Sometimes multiple conclusions can be drawn from the same data. Ok, often multiple conclusions can be drawn from the same data. This is especially the case with the Connection Progress that pops up between tools. You may be a bit familiar with this already. When you run a module, you may see something similar to the following: 114gb of data is being passed through my data stream! Is this a lot? Well, yes, but ultimately we have to remember that Alteryx processes everything in memory. Knowing this, the information that we see above doesn't mean we have 114gb of data being written directly to disk (many PC's don't even have this much available). Simply put, there is a ton of data there but if you do not have any type of output connected to the tool, it stays in memory. If we were to connect say, a Browse Tool to the end of my XML Parse Tool shown above, the temp file written out by my Browse Tool would in fact be every bit of that 114gb. Luckily, I don't really need the data written out at this point (I'm performing further analysis downstream), so I simply add a Select Tool just after this and de-select the field with the massive amount of data and just like magic, my module runs very fast and efficient. This little bit of info can be both extremely valuable and scary at the same time. The value is simply that it shows you the amount of data you are dealing with. The scary part is that it can be assumed this is all being written out to disk during runtime. We now know that as long as we're not attaching a Browse Tool to the data at this point, and we deselect the fields we do not need further downstream, we keep our module tidy and efficient! Until next time, - Chad Follow me on Twitter! @AlteryxChad
Alteryx is designed to use all of the resources it possibly can. In order to make Alteryx run as fast as possible, it tries to balance the use of as much CPU, memory, and disk I/O as possible. The good news is that most of the resource utilization can be controlled. You can limit the amount of memory that is used on a system, user, or module level. The Sort/Join memory setting is not a maximum memory usage setting; it’s more like a minimum. One part of Alteryx (sorts) that benefits from having a big chunk of memory will take that entire amount right from the start. It will be split between all the sorts in your module, but other tools will still use memory outside that sort/join block. Some of them (e.g. drive times with a long maximum time) can use a lot. If a sorting can be done entirely in memory, it will go faster than if we have to fall back to temp files, so that’s why it’s good to set this higher. But if the total memory usage on the system pushes it into virtual memory, you’ll be swapping data to disk in a much less optimal way, and performance will be much worse and that’s why setting it too high is a bigger concern. The Default Dedicated Sort/Join Memory Usage can be found in the Designer at Options > User Settings > Edit User Settings Best Practices on Memory Settings 32-bit machines*: Setting should be on the lower, conservative side. No matter how much actual RAM is there, only has at maximum 1 GB available, as soon as it is set higher, the machine will cross over into virtual memory and be unable to recover. A 32-bit machine should never have a setting over 1000MB, and 512 is a good setting. Set it low (128 MB), especially when using Adobe products simultaneously with Alteryx. 64-bit machines: Set this in the system settings to half your physical memory divided by the number of simultaneous processes you expect to run. If you have 8 GB of RAM and run 2 processes at a time, your Sort/Join memory should be set to 2GB. You might set it lower if you expect to be doing a lot of memory intensive stuff on the machine besides Alteryx Set your Dedicated Sort/Join Memory Usage lower or higher on a per-module basis depending on the use of your computer, doing memory intensive non-sort work (i.e. large drive-times) then lower it, doing memory intensive sort-work then higher.
*Please refer to this link for additional details on 32-bit support for Designer
A great new feature in v10 is the ability to turn on profiling to see which tools are taking up most of the time in your module.
You can turn it on here in the Workflow Configuration in the Runtime section:
This setting will add a profile of the tool times (in descending order) to your output log:
Thanks to @ScottL for mentioning this at the Analytic Independence Roadshow in London, it's really useful for helping optimise workflows, and as he pointed out the screenshot shows off the speed of the regex engine too!
Any tool that performs a Sort will increase processing time. Many tools in Alteryx utilize a sorting process as a facet of their inherent functionality or if a group by option is used. The Summarize tool does this, for example. Alteryx has the ability to remember Sorts, or said another way, it knows if data has already been sorted upstream. This is a behavior that you can take advantage of as yet another means of optimizing your module’s processing. In the example shown, the process in Green is more efficient, as sorting will only be performed on the data once. For the process in Red, sorting will have to be performed twice.
Tools that Sort
Tools that use Sorts:
Tools use Temp Files and/or Internal Sorts:
Block Until Done
Multi Row Formula
Distance (Guzzler Only)
Trade Area (Guzzler Only or Non-Overlapping)
Run Command (Temp files only)
Use Temp Files and/or Internal Sorts at the Process level:
Batch Macros (for secondary inputs)
Iterative Macros (for secondary inputs)
Behavior Create Profile
Any time you are planning to run a module in the background while you are going to continue doing other work, it is a good idea to run it with less memory.
It is also good to check the box for running Alteryx at a lower priority. This will ensure that the Alteryx Engine runs at a lower priority than all the other applications running on the same machine. By doing so, even the Alteryx GUI will remain responsive when you are running a large module in the background. It is also a good idea to have the temporary directory point to a separate physical hard drive from your boot drive. If your temp drive points to C: emp and you run a module that consumes 100's of GB of Temp space (it happens), your system may become unstable.
Here are the two most recommended best practices for optimizing module speed: Disable your Browse tools
Disable your browse tools via the Module Properties window. This checkbox can allow your module to run faster, and take up less memory and temp space by preventing Alteryx from having to generate the content (temporary .yxdb files) that must be displayed in your Browse tools.
Close the Output Window
If you find that your module produces warnings, go ahead and close the Output Window in order to speed things up for your module. This is a feature that if asked to display hundreds or thousands of lines of information, can slow down your module. To reopen the Output Window, go to View > Show Output Window.
Apart from the Browse tool, you may find that the Select tool is the most frequently used tool in Alteryx, and it should be! The best way to optimize the performance of your modules is by removing data from your processing stream as quickly as possible.
Ned's has a new blog post! If you aren't currently following InspiringIngenuity.net, he regularly covers topics that run the gamut of Alteryx engineering, all of which directly relate to many client needs. In the latest post, he covers optimization of modules, a topic often brought up by clients. If you have read our 2014 Tips & Tricks guide, you may remember some of these, such as limiting the data you push through a module and optimizing the data with the Auto Field Tool. Ned goes into more detail on the 'why' and includes an optimized version of the Percentile Macro that he updated using these methods. Check out the Inspiring Ingenuity post, and let us know your thoughts in the comments. Thanks for reading, until next time! Chad Follow me on Twitter! @AlteryxChad http://inspiringingenuity.net/2014/11/03/alteryx-optimizing-modules-for-speed/
Here in Customer Support, we like to make sure our workflows run as smoothly as possible. Sometimes you may ask yourself “Why can’t my excel file just open right after I output it?”
Where there is a will, there’s a way.
Ladies and Gentleman I introduce to you the beautiful functionality of the Run Command Tool.
With this trick, you never have to go into the directory to manually open the saved file again.
Note: There are multiple ways of doing this with the run command. This is just a simple example I created in version 10.1.
For this trick you’ll only need 5 additional tools.
Before your output tool, you’ll need a Block Until Done Tool. This is for quality control as Excel doesn’t like to open when you are writing in to it at the same time. This prevents Alteryx from writing and opening at the same time.
Next, you’ll need a Formula Tool, which will just add another field with your command. When making this dynamic, you’ll want to make sure you update both the path of your output and formula tool.
Whether you do that in a macro or app, it’s entirely up to you. After the formula tool, you’ll want a sample tool so that way we only select tool to deselect all the fields except for your command. Now just make sure your run command is set up properly. In this example it’s set to write out the command as a bat file and be read back in as a command.
To learn more about how to configure the run command, I would read though this helpful article.
After you have this set, watch as your workflow has made your life even more automatic.
Truncated data is usually defined as numeric rounding or cut off, string shortening, or datum deletion - essentially any time information is lost. Since, as analysts, our insights are only as good as our data, we usually find ourselves trying to preserve the integrity of data as we’re processing it. That’s not to say we can’t also optimize our resources usage when our data will allow for it.
While keeping your data types as small as possible is important, and can serve to shorten run times, it is even more important to understand what data types are most accommodating to your data and in what situations they can be shortened without truncation. If you’ve ever seen Office Space, you understand just how important even fractions of a cent are – and accuracy doesn’t just apply to financial data.
To avoid truncation in your data you have to first explore your data types in the context of your analyses. What data types and sizes did you receive the data in and why? What format would we like to see our results in? Are there opportunities to reduce memory usage between those two? And finally - what operations will we perform on the fields in our workflows that may impact each data type and size? The answers to these questions will be unique to each dataset, but once they’re addressed you can use the same techniques to keep your data both optimized and accurate.
Start by identifying the data types that most closely fits your fields based on the questions above. If you want Alteryx’s best guess, try using the Auto Field Tool to assign optimized data types and sizes automatically. While this tool is immensely helpful, be sure to check that the output is not truncating data or leaving it in a form less conducive to your downstream analyses – the Auto Field Tool doesn’t know the answers to your questions above. You can have the best of both worlds by adjusting the assignments from the Auto Field Tool, where necessary, by placing a Select Tool (master it here) just afterwards. You’ll then have suggestions and be able to change the less accurate/accommodating assignments by hand. Some things to consider:
If performing string operations later in your workflow that may increase their length, pay close attention to see if they are being truncated after that maximum string value is reached. String and WString (accepts Unicode characters) types are set length and will drop any characters that exceed their size. On the other hand, V_string and V_WString (accepts Unicode characters) are of variable length, and will adjust to accommodate strings after assignment.
Numerics may seem the most intuitive of the bunch, but pay close, close attention to the precision of each type so as to avoid unintentional rounding. Fixed Decimal is the only type to have an adjustable length – the rest may force your data to fit if not assigned to the correct type.
Dates are not always input in the standard ISO format yyyy-mm-dd HH:MM:SS, and may require some converting in order to handle them as Date/Time types in the Designer (this is important when trying to sort or filter by date or apply Date/Time functions in formulas). Any other date format will be interpreted as a string, and treated as such, unless converted using the DateTimeParse() function, the DateTime Tool, the Parse Dates Macro, or the fan favorite BB Date Macro. If you need to return to your original formatting, or simply prefer another, you can always convert your dates back into another format string after they’ve been processed using the DateTimeFormat() function or the DateTime Tool.