We are in the final days to enter our Data and Analytics - "What's your why?" raffle with SparkED!

Reply with your "why" in text/video format to our forum post by January 28 to be entered! 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

Controlling the Order of Operations

DanH
Alteryx
Alteryx
Created

Sometimes, especially when interacting with tools/files outside Alteryx, you need to explicitly control the order of tool execution. The classic example is a process whereby you 1) INSERT to a SQL table where an auto-incrementing ID is applied 2) read the records back in order to get the auto-incremented id 3) insert the auto-incremented ID into a different table.

 

In Alteryx worflow streams that terminate (as in an Output or Render), you can't reliably use Block Until Done to control the stream. One way to reliably control the order of events is with batch macros.

 

Batch macros guarantee that 1) every iteration is complete before the next iteration begins, and 2) that all iterations are complete before the macro reports itself complete. So if you place a process inside a batch macro and create an output stream from that macro, all processing will be done and the macro still can exist in the middle a workflow stream. You don't even need a Block Until Done as part of the process.

 

Please see the attached examples

Attachments
No ratings
Comments
Claje
14 - Magnetar

Hi,

Correct me if I'm wrong, but in the attached example I believe that the batch macro is inadvertently running once for every single record.  In this use case it isn't a large impact as the file is small and is overwritten on each iteration, but this could be an issue if running SQL or appending data.

The solution to this is to either use a sample or a summarize tool when connecting to the Control Parameter input in order to ensure that there is only one record coming in with the "batch" value.

 

claje_runbatchonce.PNG

DanH
Alteryx
Alteryx

Hi @Claje , 

I updated the workflow a while back. Should be cleaner now.

Thanks,

Dan

DanH
Alteryx
Alteryx
SeanAdams
17 - Castor
17 - Castor

Hey @DanH  - this is how we do this in our team (use batch macro with fake control param).

 

Given how common this need is - are you able to influence the team to get a simple order of ops into the product without having to use dummy batch macros and fake control param.    

 

Cc: @Treyson 

DanH
Alteryx
Alteryx

Hi @SeanAdams - I can't really speak to the road map in this regard. I've raised it with our Product Management team. I believe it's more complex than it appears, which is why it hasn't been developed so far.

 

Out of curiosity, what is your use case? 

SeanAdams
17 - Castor
17 - Castor

Hey @DanH 

 

There are many use cases that require order of operations:

- Loading dimension tables before loading fact tables

- Move a file into a different folder once the import is complete and successfully validated (standard ETL behaviour)

- Doing enrichments only after fully loading the dependant tables (for example - bring in updates on the client table and persist to the DB; then bring in the updates to the products table and persist in the DB; then the client product elligibility copied into the DB - only once you've done both of these can you run the enrichment process to create a new enriched table within the analytics DB.

- Running a reindex; and statistics update once all the fact & dimension tables loads are done

 

In terms of complexity - I must admit I'm a little surprised - if it can be done with a fake batch macro and a fake control param, then an easy way to implement this would be to use this same behaviour as already exists but without the requirement to do the batch / control param bits (in other words - this is just a strip-down of batch macros to a batch of 1 with no control param, you already have the capability).

Michael_Draper
6 - Meteoroid

I, too, would be interested in a more simple solution to this problem. My use cases typically involve workflows that have modular pieces with intermediate datasets. Here is an example mocked up:

 

Michael_Draper_0-1598907778196.png

This is a process that repeats regularly, but sometimes I want to run the whole thing (say, monthly) and sometimes I can save time and run only the "step 2" container (say, weekly). If, however, I run the whole thing and the input in "step 2" runs before the output at the end of step 1 completes, I would inadvertently be using an older copy of the intermediate data set.

 

In your example workflow you mention Tool ID's as one potential way to control order of operations (although you mention limitations). How does that work? I can see that batch macros will work (I think chained analytic apps would also get the desired result), but this is a common scenario I am running into and those solutions are a little more cumbersome than I am hoping for. Wondering if there's a better way. Thanks!

mutama
Alteryx
Alteryx

Hi @Michael_Draper ,

 

Have you tried using Crew Macros’ Parallel Block Until Done tool? See here for a detailed walkthrough of its logic: https://www.bigmountainanalytics.com/control-the-order-of-operations-in-alteryx/

 

Best,

Michael Utama

Michael_Draper
6 - Meteoroid

Thanks for the response @mutama. I have been able to use the parallel block until done in certain situations, but it still does not handle all use cases.

 

As an example, the linked article handles the situation where we want to input data before we overwrite it, but we still have no easy way to write over a data set before we input it. Maybe that is not common or not best practice, but none of the block until done tools will work on tools without an input connector, to my knowledge. I've also run into trouble here with In-DB tools, where I want certain SQL blocks to run after others, without necessarily having them all connect in-line with tools (such as having multiple "Connect In-DB" tools).

 

Currently I break workflows out into multiple files and use the Crew Runner macros to achieve this. Which works, but a native solution would be nicer. I still believe there has to be a better way. Maybe a different kind of container that acts like a batch macro? It would essentially be the batch macro hack demonstrated by OP, but packaged in a way that all the tools can appear in the same workflow. The container itself would have an input connection that only waits for data (does not actually do anything with that data/this is your dummy control parameter) to begin running the tools inside it. Just a thought.