As we develop workflows, it is inevitable that we will need to make additional modifications downstream of our data. To see the updates, we need to run the workflow. Depending on the data, this can take a long time. Let’s be honest, even waiting a couple of minutes to re-run the workflow is too much time... Ain't nobody got time for that!
Alteryx 2018.3 is here to help, with a shiny new caching feature! With caching, you can drastically reduce the running time of a workflow. Now, you can make changes without waiting for every single tool on your canvas to re-run.
What's a cache?
A cache is "a computer memory with very short access time, used for the storage of frequently or recently used instructions or data." In the context of an Alteryx workflow, this would be the data you're bringing into and modifying with Alteryx Designer.
There are many reasons you may want to use caching...
- Pulling in data can take a long time. Often, the most time-intensive part of building a workflow is just getting the data into the workflow. Creating a local cache after your Input Data tool or Download tool can save you buckets of time as you develop your workflow.
- Cut out the up-front processing time. Once data is in the workflow, it may take a few operations to get it ready for the main event. Maybe the dataset is larger than you need, so you need to use Filter, Join, or Select tools to slim it down to what you'd actually like to work with. Since the raw data you're working with prior to this trimming is much larger, it takes longer to process. Dropping a cache after all the initial pre-processing is done can save you the time of processing this logic every time the workflow is re-run.
How do I use Caching?
- To create the caching point, you click on the last tool you want to be included in your cached data (i.e., you will not make changes to this tool or any upstream tools).
- Right click on this tool and select Cache and Run Workflow. This will re-run your workflow, so you can use this opportunity to go get a cup of coffee if necessary.
- As it runs, you will see boxes that look like ice cubes forming around each tool, up to and including the tool you chose as your caching point. This indicates that the data is cached or “frozen” in these tools.
- Now, you can make changes downstream and see your workflows run at warp speed (or at least see a drastic reduction in runtime).
Step 1: Right click, Step 2: "Cache and Run Workflow", Step 3: Profit
How it works
Some of the top questions we receive on caching are regarding how this feature works. So here’s a behind-the-scenes look!
Caching plugs into the temp file management system that is being used by Alteryx Designer for all sorts of things, including the ability to preview your results at any point of your workflow, bringing In-Database data in with the Data Stream Out tool, and of course, populating Browse tools with their data and reports. These temp files are used in-the-moment to support visualizing your data and previewing it while you build out the workflow. When the workflow is closed, or you close Alteryx Designer, these temp files are cleared out so they don't stick around and eat up space on your computer.
Here is what's going on when you click Cache and Run Workflow on the Download tool in the video above:
- An invisible Output Data tool is dropped in after the Download tool. The Output Data tool is configured to write out a yxdb file to the temp directory.
- The workflow is run, and the caching yxdb file is written out to the temp directory.
- The invisible Output Data tool is removed.
- The tools in this workflow up-to and including the Download tool are then disabled, much like as if they were in a disabled Tool Container. These are the tools that have the ice cube/bubble now.
- Lastly, an invisible Input Data tool is dropped into the workflow in place of the Download tool.
So, next time the workflow runs it will pick up from this temporary cache file instead of at the beginning. When you edit a tool that has been disabled in a cache, all of these changes revert and you're back to your regular workflow.
What tools can I cache?
There are two rules for a tool being eligible for caching:
- The tool must have a single (1) output anchor.
- The tool must not be in a 'circular' position.
The reason for these limitations is really simple - to avoid accidentally caching data that you weren't expecting to cache. The last thing we want to do is crash your computer while trying to save a cache billions of records long!
For a deep dive into this limitation, check out @SydneyF's excellent article What Can't Be Cached?
That's all I got for now, so...
Special thanks to my talented colleague @MindiG, who co-authored this blog post and made the video demo.