In my conversations with people at Inspire, I hear a couple of topics come up frequently: How can I cache data that's already been processed, and how can I schedule workflows to run consecutively?
My answer to both of those questions is the Runner macro.
Yes, there is the cache data tool - but I actually have to do something with that. I have to change the settings from input to output.
Using the method I'm describing here, that caching is automatic.
The first step is to plan out what you want to do with your data, where you want to cache it, how you want to process it. Then you divide those steps into logical sections, and make workflows that only address one section at a time.
In the example I've attached, all I'm doing in workflow 1 is creating a .yxdb from my data. That's always a good place to start, because if you have, for instance, a .csv file coming in, Alteryx will process that same data exponentially faster than if it remains a .csv. Yes, there is some processing time to do the conversion, but everything else will run faster after that.
Then, you can make as many other workflows off that yxdb as you like. Maybe in one workflow you do all your joins, and in another you create all your calculations.
The scenario where splitting up my workflows makes the most sense is when I have multiple disparate data sources. I might have to do a lot of processing to get all that data to be homogeneous. But I need to perform different kinds of processing on each of those data sets. So after I convert each data source to a yxdb, I'll make a workflow to process data source 1, another to process data source 2, etc.
Finally, I'll make a workflow that re-unites all my other workflows.
The magic touch is to then create one last master workflow to run all the others, and this is where the Runner macro comes in. I add a Runner tool for each of my underlying workflows, separating them with Block Until Done tools.
Then, when I'm ready to process my data, I only need to run one workflow - the master workflow with the Runner tools - and everything else is automatic.
It makes it a LOT easier to fix things if they break (and we all know they do sometimes), and I don't have to sit and toy with the cache tool. I can hit run and walk away or do something else.
If you haven't tried this method, I encourage you to experiment. If you have any questions, I'm always here on the forums, so feel free to reply or ping me directly.
Happy analyzing!
Solved! Go to Solution.
Great use case.
I've just used this same approach for a CRS regulatory reporting project where client data was dotted about with 5 eventual outputs all including some elements from the customer record.
Address information for example was tricky with joint accounts, individual accounts, multiple source systems etc, requiring a decision tree depending on data completeness.
Considerations for this approach ;
1. The workflow and data folders get more cluttered.
2. Someone who doesn't understand might not know to run the controlling workflow.
3. Leave room in your numbering based naming convention for new steps, I ended up with 001_<workflowname> !
4. Name the intermediate yxdb the same as the workflow.
5. Drop/create an Excel output in \ExcelCopies folder with the same name and input as the yxdb
#5 saved me lots of time explaining my "workings" to non-Alteryx co-workers (poor souls!) who could investigate the data lineage if an output was queried.
Finally, think that the Message Expression needs to be "Check Underlying Workflow", quotes are missing at the moment ?
Thanks
dan
I put Tests on the Error outputs with checking for 0 recs, but that's only because I hadn't investigated messages before =)
Great post! I love these macros. I also always use these in combination with the log parser tools. This means you can see all the error messages in the actual 'Run All' workflow.
That's smart, hadn't thought of that, good productionising !
Followup question: do you feel the block until done is needed when using the conditional runner? Is success / fail only determined when the underlying job is complete?
I'm using these tools extensively including grabbing the output with log parser, appending timestamps, filtering for the underlying job name, and logging the info into my database so we have run history including timings between the underlying jobs.
But I hadn't thought about block until done being required. Obviously it wouldn't hurt, but does it actually make a difference here?
As I recall, the runner tools were running simultaneously without the block until done, and that was making me nervous. Granted, I started doing things this way a LONG time ago, and things might have changed.
Perhaps others who have experimented more can chime in here.
Hello @andre347 ! A quick check.
Happened to think in similar methods as you and building a bigger process. Apparently when I hit the rock, looks like you have already tried it out here. Have you faced issues where you happen to pass nulls to the list runner and the Macro don't fail, but run the macro assuming it got the path and passes out?
Heads up: I don't use conditional list runner as it fails to process if I have even a filed conversion error, which happens to be in most of my WFs due to incoming data at times. Can't rule out.
Logic:
Am trying to loop a conditional run based on the messages from prior WFs. So the PATH formula in the first stream send in the path to the second List runner only if few conditions are satisfied, if not I want to stop the process. So, I send a dummy file path ideally a NULL to second list runner so I want to error out. Also I collect logs to make some conditional run processing for a larger process.
ASK: When I run this logic on a new workbook it works as expected with the error, but when I save the WF and run it the control passes to the next list runner and process the consecutive pipes ahead of it.
Is there a specific reason for the macro to store the directory and behave like this? Any thought would be of great help!
Test:
Expected outcome.
Actual:
Post saving , differed outcome.
@ThizViz, thank you for the post!
I was not able to run the examples attached. Tried running the RunAll workflow first, but got error message importing (attached below). Then i thought maybe i need to run the sub workflows first to generate the inputs for the RunAll workflow. That did not help either. Am I missing something?
Thank you