With the release of 2018.3 comes very exciting new functionality – workflow caching! Caching can save a lot of time during workflow development by saving data at “checkpoints” in the workflow, so that each time you add a new step to your workflow, it does not need to rerun the workflow in its entirety, rather it can pick up from your last cache point.
To create a cache, simply right-click on the point in your workflow that you would like the data to be cached at, and select the Cache and Run Workflow option from the drop-down menu.
Note: If you have configured your workflow to enable the Disable All Tools that Write Output option, caching will still be available as an option, but no data will be cached. This is because caching writes a temporary output file to be referenced when the workflow is re-run from the caching point.
For versions below 2022.3, there are a few tools in Alteryx that cannot be used as cache points due to two major conditions that prevents a tool from being eligible for caching. The first and most straightforward is tools with multiple outputs.
Tools with multiple output anchors cannot be cached. This includes the Join Tool, many (but not all) of the Predictive Tools, the R Tool, the Python Tool, as well as a few others. Starting in 2022.3, tools that have multiple output anchors (maximum of 5 anchors), can be cached. This feature extends the caching functionality to over 50 additional tools. Please note that some tools (including those with more than 5 output anchors and In-DB tools) might still not be cacheable.
The second condition is a little trickier to understand conceptually. Any tool that is in a “circle” cannot be cached.
What is meant by a “circle” is the condition where the output of a tool is being combined with a different component of the same data stream, effectively creating a circle around the tool with the connection lines. Here are some examples of un-cachable circles:
The reason tools in this condition cannot be cached is similar to why tools with multiple output anchors are excluded. In a “circle situation”, the downstream tool requires data from both stream #1 and stream #2 in order to proceed. The only way to effectively cache in this situation would be to create an additional, invisible cache for the tools being joined in parallel.
To make sure only expected data is being cached and prevent unintentional overuse of resources, no ghost cache is created, which disqualifies tools in "circles" from being caching checkpoints.
The good news is that tools with single outputs downstream from tools with multiple outputs or “circles” can be cached without any issue!
Now that you are well versed in the limitations of workflow caching, you should be able to develop new workflows and test and modify old workflows faster than ever before!