Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
ned_blog
8 - Asteroid

What are Batch Macros?

 

Batch macros are a new feature in Alteryx 5.0. They add some fairly simple capabilities that let you solve problems in ways that would otherwise be very difficult, if not impossible.

 

Batch macros are just like other Alteryx Macros, but with the addition of a special input we call the control input. Records that go in the control input don't become streams inside the macro like regular inputs. Instead, for each record, the entire macro will be reconfigured and run. You can use the fields in the control input records for two things: 1) to reconfigure the macro at run time, just as if they were the answers to questions in the macro GUI; 2) to group the records going in the other macro inputs into batches. Frequently, you're going to do both of these, but today we'll take them one at a time and look at a simple example that uses each of them.

 

Run Time Reconfiguration

 

For each record that comes in the control input, the batch macro will get reconfigured. You can use the fields in the record as variables to update the configuration.

Here is a blog post Ned wrote about a module that parses web server logs. Since their format is pretty arcane, it can only parse one log file at a time. Typically though, you'd like to parse a whole directory full of them at once. So in order to do that, we're going to turn his module into a batch macro. Starting with the original module, we'll go to the module properties, change the Module Type to Macro, and hit Create. This brings us to the Macro Design window, and the first tab there is "Batch Macro":

 

 

Starting from the top, obviously we're going to check the "Enable as Batch Macro" box. Next after that is:

 

  • Output Mode – Each time we run this macro it will produce a separate set of records on whatever output streams it has, so those will be merged together at the end by an implicit Union tool. But since we reconfigure the macro each time, the records coming out could have entirely different schema from one batch to the next. So just like the Union tool, there are three options to deal with that: 
    • All Iterations will have the same output schema - This is the simplest option; Alteryx just takes the schema from the first run and assumes the rest will be the same. If they are not, you get an error. The advantage here is that we don't need to wait for all the records to come through to start pushing things downstream.
    • Auto Configure by Name – This will wait until all the iterations get run, and combine the schemas by putting fields with the same name into the same field. When they have different types, one will be chosen that can hold all the values.
    • Auto Configure by Position – Just like by name, but fields are matched up by their position in the record. Field Names will be set based on whichever iteration is the first to have a field in that position.

If your macro has multiple outputs, they will all need to use the same output mode. For this example, we're going to choose Auto Configure by Name. Each time we parse a log file, we'll figure out which fields it has in it, but that can change if you configure the web server to log different things, so we want to be able to handle that.

 

  • Control Parameters – Here we identify the input fields from the control input we want to use to configure the macro . As noted, these act much like macro Questions; but since these questions will be answered by data in your control input, you don't need to worry about the gui, all you set is a name and a description. The description will show up in the configuration for this macro where the user chooses what field in his input to connect it to. The Name is what you will use in macro actions. For this example, our only control parameter is the path to the log file we want to parse. So now to finish up our Macro, all we need to do is go over to the Actions Tab, where we'll add a new Update/Change Value action to put the filename into our input tool:

 

 

The Control Parameter we added shows up as a variable to update from, just like the answer to a question on the questions tab would.

So that's all there is to it, now we can feed a directory tool into the control input of our macro, and parse all our log files:

 

 

Being able to reconfigure Alteryx processes based on data at run time is very powerful, but it does come with a cost. Usually Alteryx has all its configuration information up front, so all the overhead of configuration and initialization only happens once at the beginning. With a batch macro, it has to happen again at the start of each batch, which slows things down. If you're running a few dozen batches for a directory full of server logs, that will be fine. But you want to be careful you don't set things up to run a few million batches.

 

Running Records in Batches

 

The second thing batch macros can do is run records in batches. Our previous example didn't have any regular, non-control inputs, but what would happen if it had? For each record that comes in the control input, the macro will be reconfigured and it will run on the complete set of records that come in the regular inputs. Or, it can run on a subset of those records that match a field from the control input.

 

The example we'll look at here is a user request we got a long time ago, but there hasn't been a really good answer for before. The user has a bunch of store locations, and wants to make trade areas using non-overlapping radii. So far, no problem. But, these stores are divided up into separate regional divisions that have defined boundaries, and the trade areas need to conform to these. For some example data, I'm going to use county centroids as store locations, and New England states as regions:

 

 

 

The radii need to be non-overlapping, but only compared to other stores in the same region, because we're going to cut the trade area boundaries to the regional borders. There really isn't a good answer here. The user basically needed to run his module separately for each sales region, then combine the results. And that's exactly just what our batch macro will do, but now it will be easy.

 

The actual macro here is trivially simple:

 

 

Just input, a trade area tool with "Eliminate Overlap" checked, and an output. The macro has one question and an associated action to update the maximum radius, but if you look at the Batch Macro Tab:

 

 

you see that while "Enable as Batch Macro" is checked, there aren't even any Control Parameters. We don't need any, because we don't actually need to configure anything differently for each batch in this case, just run them separately from the other batches.

 

So how do we do that? Let's take a look at the module that uses this Macro:

 

 

Up at the top we read in the Locations file, which has the store location points, and a RegionId to tell us what region the store belongs to. We put this through a Summarize grouping on RegionId just to get a unique list of Ids for the control input. That way, we'll run one iteration for each Region. Looking at the configuration for our macro, we see that now that we have a regular input as well as a control, we get a new tab: Group By.

 

 

Here we can choose one field from our control input, and one field from each other input. If you pick a field for a particular record, each run of the macro will only use the records where that field matches the value from the control input. For any inputs that you don't pick a field for, you'll get the whole stream again for each iteration. Note that the terminology can be a little confusing: "Group by" here doesn't mean quite the same thing as it would in the summarize tool; we're not combining records, just arranging them in batches.

 

So that's it, now we can run our module and the batch macro will generate radii for each location, but make them non-overlapping only compared to the other locations in their region, and then we can trim the results to a region boundary:

 

 

So you can see that batch macros do a couple of things that are actually fairly simple on their own. But when you combine them with each other, and everything macros and Alteryx as a whole can do, they can be extremely powerful.

 

Download this example here.