This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
In my conversations with people at Inspire, I hear a couple of topics come up frequently: How can I cache data that's already been processed, and how can I schedule workflows to run consecutively?
My answer to both of those questions is the Runner macro.
Yes, there is the cache data tool - but I actually have to do something with that. I have to change the settings from input to output.
Using the method I'm describing here, that caching is automatic.
The first step is to plan out what you want to do with your data, where you want to cache it, how you want to process it. Then you divide those steps into logical sections, and make workflows that only address one section at a time.
In the example I've attached, all I'm doing in workflow 1 is creating a .yxdb from my data. That's always a good place to start, because if you have, for instance, a .csv file coming in, Alteryx will process that same data exponentially faster than if it remains a .csv. Yes, there is some processing time to do the conversion, but everything else will run faster after that.
Then, you can make as many other workflows off that yxdb as you like. Maybe in one workflow you do all your joins, and in another you create all your calculations.
The scenario where splitting up my workflows makes the most sense is when I have multiple disparate data sources. I might have to do a lot of processing to get all that data to be homogeneous. But I need to perform different kinds of processing on each of those data sets. So after I convert each data source to a yxdb, I'll make a workflow to process data source 1, another to process data source 2, etc.
Finally, I'll make a workflow that re-unites all my other workflows.
The magic touch is to then create one last master workflow to run all the others, and this is where the Runner macro comes in. I add a Runner tool for each of my underlying workflows, separating them with Block Until Done tools.
Then, when I'm ready to process my data, I only need to run one workflow - the master workflow with the Runner tools - and everything else is automatic.
It makes it a LOT easier to fix things if they break (and we all know they do sometimes), and I don't have to sit and toy with the cache tool. I can hit run and walk away or do something else.
If you haven't tried this method, I encourage you to experiment. If you have any questions, I'm always here on the forums, so feel free to reply or ping me directly.
Great points, when I did this in the "real world" I had an elaborate numbering and naming system. Usually the runner chain would be something like "001 Run Monthly Resorts" and the underlying steps lived in sub-folders.
And yes, you're correct about the missing quotes. I would also change the messages to specifically name the underlying workflow.
It also helps if the underlying workflows and the master workflow are set to stop running on error in the configuration.
Followup question: do you feel the block until done is needed when using the conditional runner? Is success / fail only determined when the underlying job is complete?
I'm using these tools extensively including grabbing the output with log parser, appending timestamps, filtering for the underlying job name, and logging the info into my database so we have run history including timings between the underlying jobs.
But I hadn't thought about block until done being required. Obviously it wouldn't hurt, but does it actually make a difference here?
As I recall, the runner tools were running simultaneously without the block until done, and that was making me nervous. Granted, I started doing things this way a LONG time ago, and things might have changed.
Perhaps others who have experimented more can chime in here.
Happened to think in similar methods as you and building a bigger process. Apparently when I hit the rock, looks like you have already tried it out here. Have you faced issues where you happen to pass nulls to the list runner and the Macro don't fail, but run the macro assuming it got the path and passes out?
Heads up: I don't use conditional list runner as it fails to process if I have even a filed conversion error, which happens to be in most of my WFs due to incoming data at times. Can't rule out.
Am trying to loop a conditional run based on the messages from prior WFs. So the PATH formula in the first stream send in the path to the second List runner only if few conditions are satisfied, if not I want to stop the process. So, I send a dummy file path ideally a NULL to second list runner so I want to error out. Also I collect logs to make some conditional run processing for a larger process.
ASK: When I run this logic on a new workbook it works as expected with the error, but when I save the WF and run it the control passes to the next list runner and process the consecutive pipes ahead of it.
Is there a specific reason for the macro to store the directory and behave like this? Any thought would be of great help!