This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
A workflow that includes a Python tool may throw an error of the form "Error: unable to read data (C:\AppData\Alteryx\Engine\Engine_23200_be6a9480b4fc4e038a8668b82debdf74_\aa37b5ac-6323-472b-8f0f-5cb0b95b822e\4460abb7be83bae8f01b9bf1238a923c.yxdb)"
This question is one of the most commonly asked questions fromclients I interact with and some treat it as a personal challenge to see how fast they can get their module to run. However, in the famous words of our CTO Ned Harding "Although Alteryx can be very fast, since it is such a general tool, it is only as good as the module that you have authored". There are multiple strategies to improving the speed of aworkflow from using a select tool to reduce field sizes or lookingat the default sort join memory, however the first fundamental process is benchmarking. Benchmarking The recommended process is to run the workflow three times to ensure the data has cached. If the workflow has not cached the data then this can cause slower run times, so to ensure this is a fair test running the workflow three times should ensure all the data is cached. If you want to run without cached data you will have to reboot your machine between runs. Once you have thetotal time the workflow takes to complete/run you can now look at optimizing. Optimizing your workflow! Sort/Join Memory Alteryx is designed to use all of the resources it possibly can. In order to make Alteryx run as fast as possible, it tries to balance the use of as much CPU, memory, and disk I/O as possible. Set your Dedicated Sort/Join Memory Usage lower or higher on a per-Workflow basis depending on the use of your computer. Sort work refers to thesort tooland other similar tools in re-ordering your data. Join work refers to any of thejoin processes. If you are doing memory intensive non-sort work (i .e . large drive-times) then lower it! If you are doing memory intensive sort-work then higher it. Go to the Workflow Configuration > Runtime tab > Dedicated Sort/Join Memory Usage > Use Specific Amount The Sort/Join memory setting is not a maximum memory usage setting; itsmore like a minimum, this allocated memory will be split between all the tools that sort in your workflow, but other tools will still use memory outside that sort/join block. Some of them (e .g . drive times with a long maximum time) can use a lot. Where do I find the Sort/Join memory options? To set a user level default dedicated Sort/Join Memory Usage, go to Options > User Settings > Edit User Settings > Defaults tab. The global Default Dedicated Sort/Join Memory Usage at System level can be found at Alteryx > Options > Advanced Options > System Settings > Engine > Default sort/join memory usage (MB). *******For machine bit version memory considerations please seehere. Lean for more speed! Select tool A best practice to optimize the performance of your workflows is to remove data that won’t be needed for downstream processing as quickly as possible. You can always bring that data back into the workflow later if necessary. The select toolremoves fields or columns from your data. Other tools such as join, join multiple, spatial match, find nearest, and to a certain degree Transform tools and Reporting tools have some select functionality that you can utilizewithin the tool to reduce the need to add additional select tools. Filter tool Another good way to optimize workflow performance is using the filter toolto remove unnecessary data. The filter tool queries records in your file that meet specified criteria and identifies these records in your data, such as ZIP = 01001 . You may choose to handle records that come from the True output differently than the false output by connecting additional tools to the workflow on either side. This will allow smaller amounts of data to bepassed downstream. Auto Field Tool Optimize your workflow for speed by setting the field typeto the most efficient type and smallest possible size. Use the auto field toolright after your Input Data tool to assign the most efficient type and size to your fields. Another benefit of using the auto field tool is that it will reduce the size of your output file. Enable Performance Profiling This option will allow you to see a milliseconds and percentage breakdown per tool in your workflow. Having this breakdown will allow you to pinpoint the slower tools/processes in your workflow and use the methods suggested in this article to improve that tool/process. Performance profiling can be found Workflow > Runtime > Enable Performance Profiling. Disable All Browse tools The Browse tool quickly becomes a data artisansbest friend, it allows to see/review the entire data at any given step in the workflow building process, however, each of these browse tools creates a temporary yxdb and writing these files do take some time and slow down the processing. There is an option to simply disable them so they can be easily enabled if need it. This setting can be found at Workflow > Runtime > Disable All Browse Tools. Set your limits: Record Limit for the Inputs When developing your Workflow, there is no need to bring in all your data during testing. Use the Record Limit option in the Properties for the Input to bring enough records for testing. If you want to set limits for all input tools in your workflow, you can also do this under the Runtime tab under Workflow – Configuration. Tool Containers The tool containerallows the user to organize a workflow better by combining tools in logical groups. Tool Containers can be disabled to run only certain portions of the workflow, effectively bypassing tools for a quicker run. Cache Data Designer now has theability to cachedata from relational databases through the input tool. When checked, data is stored in an yxdb file on disk so that data sources are not hit repeatedly during workflow development. Data can only be cached when running a workflow in an Alteryx Designer session. The setting is ignored when the workflow is run in the scheduler, in the gallery, or from the command line. Connection Progress The Connection progress is a great way to keep track of the number of records and the size of the data going from one tool to another. In addition to that, the thickness of he connection itself varies depending on the size of data passing through (great for troubleshooting). The default setting for the Connection Progress is “Show Only When Running” however leaving this set as ‘Show” will allow you to investigate the size of the data at certain points permanently (Properties for the Canvas > Connection progress). If you want more detail on any of the points mentioned above make sure to check out the great Tips and Tricks articles from Margarita Wilshireet al! Tips %26 Tricks 2016 Tips %26 Tricks 2015 Tips %26 Tricks 2014 Best, Jordan Barker Solutions Consultant
Some Salesforce users do not use Salesforce Security Tokens. An accepted IP range can be set up instead in Salesforce Admin. This article will walk you through the steps to modify the Salesforce Output tool to be able to be used without a Salesforce Security Token.