community
cancel
Showing results for 
Search instead for 
Did you mean: 

Engine Works Blog

Under the hood of Alteryx: tips, tricks and how-to's.
Alteryx
Alteryx

As we develop workflows, it is inevitable that we will need to make additional modifications downstream of our data. To see the updates, we need to run the workflow. Depending on the data, this can take a long time. Let’s be honest, even waiting a couple of minutes to re-run the workflow is too much time... Ain't nobody got time for that!

 

Alteryx 2018.3 is here to help, with a shiny new caching feature! With caching, you can drastically reduce the running time of a workflow. Now, you can make changes without waiting for every single tool on your canvas to re-run.

 

What's a cache?

 

A cache is "a computer memory with very short access time, used for the storage of frequently or recently used instructions or data." In the context of an Alteryx workflow, this would be the data you're bringing into and modifying with Alteryx Designer.

 

There are many reasons you may want to use caching...

 

  • Pulling in data can take a long time. Often, the most time-intensive part of building a workflow is just getting the data into the workflow. Creating a local cache after your Input Data tool or Download tool can save you buckets of time as you develop your workflow.
  • Cut out the up-front processing time. Once data is in the workflow, it may take a few operations to get it ready for the main event. Maybe the dataset is larger than you need, so you need to use Filter, Join, or Select tools to slim it down to what you'd actually like to work with. Since the raw data you're working with prior to this trimming is much larger, it takes longer to process. Dropping a cache after all the initial pre-processing is done can save you the time of processing this logic every time the workflow is re-run.

 

How do I use Caching?

 

  1. To create the caching point, you click on the last tool you want to be included in your cached data (i.e., you will not make changes to this tool or any upstream tools).
  2. Right click on this tool and select Cache and Run Workflow. This will re-run your workflow, so you can use this opportunity to go get a cup of coffee if necessary.
  3. As it runs, you will see boxes that look like ice cubes forming around each tool, up to and including the tool you chose as your caching point. This indicates that the data is cached or “frozen” in these tools.
  4. Now, you can make changes downstream and see your workflows run at warp speed (or at least see a drastic reduction in runtime).

 

 
caching live 3.gifStep 1: Right click, Step 2: "Cache and Run Workflow", Step 3: Profit

 

 

How it works

 

Some of the top questions we receive on caching are regarding how this feature works. So here’s a behind-the-scenes look!

 

Caching plugs into the temp file management system that is being used by Alteryx Designer for all sorts of things, including the ability to preview your results at any point of your workflow, bringing In-Database data in with the Data Stream Out tool, and of course, populating Browse tools with their data and reports. These temp files are used in-the-moment to support visualizing your data and previewing it while you build out the workflow. When the workflow is closed, or you close Alteryx Designer, these temp files are cleared out so they don't stick around and eat up space on your computer.

 

Here is what's going on when you click Cache and Run Workflow on the Download tool in the video above:

 

  1. An invisible Output Data tool is dropped in after the Download tool. The Output Data tool is configured to write out a yxdb file to the temp directory.
  2. The workflow is run, and the caching yxdb file is written out to the temp directory.
  3. The invisible Output Data tool is removed.
  4. The tools in this workflow up-to and including the Download tool are then disabled, much like as if they were in a disabled Tool Container. These are the tools that have the ice cube/bubble now.
  5. Lastly, an invisible Input Data tool is dropped into the workflow in place of the Download tool.

So, next time the workflow runs it will pick up from this temporary cache file instead of at the beginning. When you edit a tool that has been disabled in a cache, all of these changes revert and you're back to your regular workflow.

 

What tools can I cache?

 

There are two rules for a tool being eligible for caching:

  1. The tool must have a single (1) output anchor. anchor.PNG
  2. The tool must not be in a 'circular' position.

 

The reason for these limitations is really simple - to avoid accidentally caching data that you weren't expecting to cache. The last thing we want to do is crash your computer while trying to save a cache billions of records long!

 

For a deep dive into this limitation, check out @SydneyF's excellent article What Can't Be Cached?

 

 

That's all I got for now, so...

 

cachemeoutside.PNG

 

 

 

Special thanks to my talented colleague @MindiG, who co-authored this blog post and made the video demo.

Alex Koszycki
Program Manager, Community Platform

Alex is acutely aware of all the sleep he's lost wrangling gigantic data-sets. But that's ok; now he gets to work with the Alteryx Community, spreading a new culture of analytics. Get it done quicker, automate that task, and have more time to think about the bigger picture. Also it's fun, so there's that.

Alex is acutely aware of all the sleep he's lost wrangling gigantic data-sets. But that's ok; now he gets to work with the Alteryx Community, spreading a new culture of analytics. Get it done quicker, automate that task, and have more time to think about the bigger picture. Also it's fun, so there's that.

Comments
Alteryx Certified Partner

When I run and cache the workflow, I cannot view the configuration of the individual tools. I understand that if I make changes, it will clear the cache. But is there a chance to view the configuration at least? Or why are even the browse tools not showing the profiling? You cannot make any changes with the browse. The feature is really good but this would make it even cooler. 

Alteryx
Alteryx

Hi @Michal, as long as you don't change the configuration, you can still view. Just click that link to view the configuration. Then, when you're done, you can click off the tool and your cache will not be cleared.

config edit.PNG

 

 

 

Asteroid

I had to revert back to 2018-2 because many of my workflows have multiple inputs that I set to cache; otherwise they take too long to run (1/2 hr+).  Which is okay for the first run of the day, but not for multiple runs when you're developing and validating.

 

For these workflows, caching from a certain point forward isn't helpful because only one of the inputs stays cached while the others have to reload and take a very long time.  Why couldn't you keep the input data caching as well as cache from this point forward?  I imagine there are many users in this same predicament.

Alteryx
Alteryx

Thanks for the feedback @DE0413! It really does help us prioritize things higher on the list.

We're hoping to bring back the ability to cache from multiple points simultaneously in an upcoming release, stay tuned!

 

For yourself, or anyone else that would like to give that request a nudge, please "Star" this product idea: Allow caching of multiple tools at the same time

Atom

I'm glad you guys finally added this feature. I was using IBM SPSS Modeler back in 2013 and it had this feature!

One thing you still need to add (that SPSS Developer had back in 2013) is the ability to right click on a tool in a workflow and click "run workflow until here"

This way when developing a workflow, you can work on sections in the middle and test the results without worrying about downstream sections. Together with caching that will really help workflow developers

Alteryx
Alteryx

Thanks for the suggestion @DanielW! We'll consider your idea for a future feature.

 

Also, @DE0413 - I have an exciting update. Our upcoming version, 2019.1 will have the ability to select multiple tools and cache them simultaneously in a single run. Hopefully, this will help with your development and prototyping. Please be sure to check it out and let us know what you think!

Meteoroid

The new cache tool is a big slowdown for complex workflows. How can we set cache as a default for certain inputs?

 

I have a complex workflow with many inputs and now runs very slow to develop enhancements to the workflow as all the inputs except manual cache set each time.

 

Help soon!

Alteryx
Alteryx

Thanks for the feedback @Tim_Lang. I can see how this could be troublesome. Could you please post this idea in our Designer Ideas Exchange so others may chime in and push it up on the priority list?

 

Meteoroid

Thanks for the suggestion - if you agree, please vote for the idea.

 

It's posted under:

Cache Checkbox - Allow selection of cache with a checkbox embedded

Thank you for your tips, Alex!!!

I have saved a great time running with Cache...

Atom

.@alexko 

 

Does the cache look up additional input files that I add to the input folder (I'm using a "Wildcard XLSX input" tool) after it is created in the workflow or does it "lock up" the input data when it is created so additional input data won't be pulled into the workflow at all?  Just curious.

 

Thanks!

Alteryx
Alteryx

@yyu3 Good question. Short answer is that it indeed 'locks up' at the time of creation.

 

Essentially what is happening is that a cache is written out after whichever tool/macro was used to create the cache, and then for the duration of workflow development that file is used as the data source rather than any upstream tools.

Atom

Thanks for the quick reply Alex, I really appreciate it!  

Labels