Hi,
I have a batch macro which have three output streams, saving results into separate tabs of the excel file (for each iteration the tab names are different if that does matter).
The problem I have is when I try to save multiple streams to the file at the same time I often get an error that there is no access to the file or the backup file can't be created.
Therefore I need some solution which will allow the macro to write data in sequential order (stream 1, then stream 2, then stream 3) to avoid cases as described above.
I tried a "block until done" tool, which seems to be designed especially for such purpose, but it doesn't really work.
As far as I understood after some digging, it doesn't really wait to run next stream until previous one is finished, it just waits until the full dataset will come through the specific output of the tool (so i.e. if all records run through output 1 it will start delivering data to the output 2, regardless if the stream after the output 1 is completed or not).
I tried to use throttle tool but also with no big luck, especially that the source table can have 100 or 10000 records, so it is difficult to set proper throttle value to not slow down the process too much.
Do you have any idea how to handle that?
I know that the newest updates have some functionalities which might help, but unfortunately my corporate Alteryx is in the version 2021.3 and no chance for the update soon.
@kwieto you original error is likely coming from Alteryx attempting to overwrite multiple tabs within the same file, but that file is already having records applied to it. Said differently, Alteryx is attempting to complete multiple processes on the same file at the same time, and it just can't.
There are a couple of ways to go about it, but would definitely need to understand your use case a bit more. Is the file you are writing to a specific format (e.g. company template) you are needing the records populated to? If not, are the records unique to those specific tabs?
Yes, that's exactly the case.
The output file doesn't need to have any specific format, I am free to choose how data will be saved.
The workflow compares the data from source table against specific template, then lists records which have incorrect values (not present in the template). It also tries to find closest matches using the fuzzy match logic.
Then you have three streams:
1. - Output the data to the excel file (list of values with record ID and the match given by fuzzy match logic - if found). That's the first tab
2. - Output the same data as in stream 1, but aggregated by value and with count of records containing specific value. That's the second tab.
So if, for example, in the first tab there are have five records with same wrong value, in the second tab there will be only one record with that value and "5" as a count of records affected
3. - some statistical data, like number of records in the source file, number of records containing wrong and correct values. This is the third tab.
The workflow is designed as a batch macro in a way that each "batch" (iteration?) takes data from different column from the source file and compares them with respective column from the reference file.
For each batch there are created separate tabs with non and aggregated values (named after checked fields), but the statistical data are added to the same, already created tab.
So if you have, say, 4 batches (4 columns to he compared), in the output file you will have 9 tabs in total:
- 4 tabs with non aggregated wrong values (one per each checked column),
- 4 tabs with aggregated values (same logic as above)
- 1 tab with summary for all checked columns
There is no problem with batches, but these three steams within a batch often (not always)overlap.
EDIT:
The output scheme of streams 1 and 2 is almost identical (the only difference is additional field with record count), so I thought that I may combine these two streams and create batch macro inside a macro in a way that, i.e. first batch will output non-aggregated data, and the second batch will output aggregated data. But still don't have idea what to do with the third stream, where the structure is totally different.
And, to make things harder, I have control parameters in the original macro with which I can turn on/off each stream separately and it would be nice to keep that functionality.
I always can use these parameters and run workflow three times, once with only one output set as "on", but I would like to be able run everything at once.
TLDR: If you have a sample data source (anonymized, of course), I can create this for you.
________________________________________________
Definitely an interesting use case here, and the details above provide some needed clarity around what you are attempting to do.
Since you do not need to output in a specific format, I would recommend handing these functions as separate streams utilizing a nested batch and standard macro approach. The batch macro will serve as the mechanism for batching sections of records through off common categorical or numeric primary keys; whereas, each standard macro will handle each unique process.
First, you bring in your data to the batch macro following whatever data wrangling method you have used. Then, I would recommend having separate standard macros nested within the batch macro that handle your specific data aggregations and summary statistics computation.
Prior to attempting to output to the destination file, I would put a formula tool write before your Output tool that creates a dynamic file path. That file path is written in a way that forces Alteryx to write to that specific sheet. Since it appears you do not need any additional process following the nested batch/standard macro, you can have your Output tools directly within batch or standard macros.
I have to admit that I don't fully get how I should combine standard macros inside the batch one (I know how to do it, but don't have idea what should these macro do).
So I'm attaching two files: One with source data and reference table (each in separate tab) and the file with example how the output data should look like.
Of course I know how to compare source and reference data and get results, I'm just interested with the "output" part so I can incorporate it into my workflow. but having example of the output you see what I am aiming for :)
OK, I solved (or at least it seems so) the problem by myself.
As a basis, I took the recipe for macro which saves data into multiple sheets from here:
How to output separate excel sheet for each data stream (alteryx.com)
I did some finetuning of the solution presented in the link, mainly I changed the way how the tab name is given for saving (not as a control parameter, but from the chosen field in the source table), and added a way to modify the path to the output file.
But after some test runs I realized that this doesn't solve my problem and still there are cases when I'm getting the "access denied" error.
I tried to slow down the process with throttle tool, but it seems that it works different when put in regular workflow and inside the macro (It seems that when used in macro it doesn't slow the stream as much as in a standard workflow, I don't know why?)
There is a "Crew" macro named "wait a sec" which should be capable to do the job, but unfortunately I can't install it on my corporate machine.
So I took a solution described by @BretCarr in this thread: Solved: Re: Throttle an Iterative Macro - Alteryx Community
And that did the job. As I wrote above, the macro doesn't slow as much as when I use the mentioned solution in a normal workflow. For example, when I just run the workflow with the abovementioned throttling way, one run takes about 15 seconds. But if I put it into the main macro, which makes 12 iterations (so the "output" one runs 12 x 3 = 36 iterations, which should take about 15s each = 9 minutes in total) the time count for the whole workflow is less than 1 minute, though playing with throttle settings changes that time a little (but not as much as I would expect).
I don't know why it is like that?
Anyway, it still it slows the process enough to not have the "access denied" error, as I tested on few different files.
The modified workflow looks as below - Inside the "output" macro:
Two control parameters handle the path to the output file (right one) and changing type of saving for the stream which contain "summary" data (from "Overwrite" to "Append")
The "throttle" path is selecting the field containing stream number (the same for all records), then aggregating it in a way that the stream after summarize tool have only one record, then adding some additional records for throttle tool, then limiting the output to first record only, then using find/replace tool to replace same data with identical from the "throttled" stream.
This way I have a control that no data will be passed to the output tool until the throttled (lower) stream will finish.
The macro is fixed into original one like below:
A you see it generally is handled as in the recipe described in the first link (combining all three outputs into one using unique tool then using a macro to split them into the tabs to the output file), just with some modifications:
@kwieto - Block Until Done Tool is great for writing to multiple data tabs without a macro. You just have to be strategic in how you use it.
Check out a copy of the workflow in the link below from our CT AUG meeting in August. The way I have done it (I had a 30 tab workbook that was a continue string of block until done tools.
Basically the concept is to put it at the end of the first output and force a Count records that you append (de-select that actual count from being appended though) to the next output so that it can not proceed until it gets the data appended to it. you can continue to attach block until dones to create a process that is forced to update one tab at a time without being locked out.
No macros required.
Hopefully this is helpful to you. Have a great weekend.
@DanielG thanks for the input, the thing is that I already tried combining block until done and count records tools in very similar way and it didn't work for me.
My assumption was that since "count" tool need to receive all records before calculating the total, it will successfully prevent outputting the data before the upper stream will finish.
And while it worked during the testing, after implementing it into batch macro and running couple of iterations it always resulted with "access denied" error somewhere in the middle (usually not in the first iteration, but in some of following ones).
I guess that the reason is that if your output table is small, like few records, the count tool doesn't really slowdown anything (or not enough to make a difference).