This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I look at the block until done as a queue of row data. Once all of the records are read from the up-stream data, then the first gate is opened. Once all of the data has passed into the first gate, the data is passed to the 2nd gate.
If you want to see all of the data go through macro 1 and then go to macro 2. I would do the following:
After macro 1, get a count of all of the output rows. You won't actually use the data, it just is a process that requires macro 1 to complete.
Put the count of all rows (a single record) into an APPEND FIELDS tool (source or bottom). You will put the input data (same data that went into macro 1) into the append fields. Now configure the APPEND FIELDS tool to un-check the count field from the source.
What will happen now is that macro 2 can't start until macro 1 is done. Take the output from the append fields tool into macro 2.
Alteryx ACE & Top Community Contributor
Chaos reigns within. Repent, reflect and reboot. Order shall return.
My question is why do I see a completion percentage on gate 2 while gate 1 is still running? The workflow pictured is simply my hacky way to manage flow control (the Achilles Heel of Alteryx IMO). The input is just one dummy row and the first macro makes an insert into my db while the second uses the newly inserted data.
Like I mentioned in my first post, the job is running perfectly in sequence, I am more curious about the percentages being displayed. It's not how I would expect Block Until Done to work and I had to do a bunch of validation to make sure things happened in sequence.
Wow, thank you so much Mark! While this may not have directly answered roadhouse's question, you just helped me solve a puzzle that was mind-boggling! This is exactly what I needed to do to get a daily workflow to make a "yesterday's" copy of a yxdb before it writes over it with the new data for today.
I am trying to troubleshoot some Block Until Done issues that I have run into similar to what @roadhouse was seeing in his original post. I have heard the BUD functionality explained similar to your gate description, but have found certain instances where it appears this is not the case (see attached video).
The gate explanation does not seem to hold here given that out3.csv is not written as soon as the the data clears Gate 1 and hits the Select tool. It actually does what I would expect the BUD tool to do, which is wait until the workstream outputting out2.csv completes before kicking off Gate 2 and outputting out3.csv. That being said, I have seen the exact scenario occur from the original question in this thread.
In my case, I have macros that conclude with a Run Command, but no Output Data tool. My next testing scenario is to modify some of these macros to have dummy output to test the theory that all outputs on a branch have to complete, but I wanted to see if there was an explanation for the behavior shown in this video in light of my current understanding of BUD behavior. I have also used the append fields tactic in the past, but this can get a bit busy when trying to control order of executions in a complex workflow.
the completion percentages are associated with the two macros (not the block until done tool). so, as @MarqueeCrew stated earlier, the entirety of the dataset passed thru the first output anchor (i.e., "1") before passing the entirety of the dataset thru the third output anchor (i.e., "3"). the completion percentages of the respective macros is somewhat immaterial. i will posit that this is not very intuitive, and that what you're experiencing (and expecting to see) isn't uncommon.
Thanks @GarthM. This makes sense to me that the percentages are based on the progress of the macros themselves. What does not make sense to me is the gate explanation of releasing all of the data on Anchor 3 as soon as all of the data is released on Anchor 1. I have heard this explanation before, and the video that I attached to my post seems to contradict this description of the BUD behavior. Any ideas what might be holding up the data from being output to out3.csv as soon as the data hits my select tool?
while i'm not 100% sure my guess is that the exchange between the engine and the plugin (i.e., the tool you see on the canvas) for output #1 hasn't finished yet. while that seems to contradict what you're witnessing on the canvas there's more to the "gate" explanation than the basic description i provided earlier (my apologies). the following is my extremely shallow understanding of what occurs between connected tools in a data stream:
for starters, passing all records to a downstream tool isn't the final step for a given output in a tool. for tools with multiple outputs (e.g., BUD). there may be a need to notify both the downstream tools (e.g., the Select tool connected to #1), as well as any "sibling" outputs (i.e., #2 & #3) that data is coming, and to start preparing for it (i.e., initialize). in addition, most downstream tools must report back to the upstream tool that "yes...we got your message and are receiving (or have received) the data you warned us about." that message is part of what allows Alterxy to terminate the process for output #1, and free up resources for the next process (output #2). in the case of BUD, and while i don't know for sure, right before Alteryx releases its' stranglehold on output #1 it creates a shadow copy of the data that gets shared with output #2. once output #2 has received the shadow copy output #1 gets destroyed. so, my guess is that the process for output #1 has yet to fully exit out; hence, why we see all of the records being reported on by the Select tool even though BUD is still at 60%. i believe that the BUD progress is reporting on how far along it is on creating a shadow copy of the data for output #2. given that output #2 will be sending the data to an output tool it needs to know the size and structure of the data before communicating to the downstream tool (e.g., Output::out3.csv) to get ready for a delivery.
keep in mind, everything i've suggested here could be totally wrong. i'm simply venturing a guess based on my (juvenile) understanding of how the engine communicates with our tools.
Hey @GarthM, appreciate the explanation and I think that I am following. The thing that still doesn't seem to add up in my mind is that if this is the case of a race condition between preparing data for Gate 2 and disposing of data for Gate 1, shouldn't I be able to increase the delay on the Wait A Minute tool to the point of providing enough time for these BUD operations to complete and Gate 2 to release? If Gate 1 doesn't care about the operations of downstream tools, it should see that the Select and subsequent Summarize have all that they need from the incoming data stream and can free up those resources. However, if I increase the Wait A Minute to an actual minute, out3.csv is still not produced until after the full minute expires.