Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
ned_blog
8 - Asteroid

I have been working on a blog post that is going to integrate a variety of different aspects of Alteryx together to paint a larger story. As I have been building towards it, I have run into a variety of challenges that have grown into blog posts in their own right. It started with Weighted Medians and continued on with Downloading from TIGER. Today I ran into a very common problem – I needed to process records in groups. The solution I outline is extensible to any macro; it should be a useful technique for any data artisan.

 

I wanted to change the Weighted Median macro in order to make it work with groups of records. Instead of a single weighted median value, I want to return a separate value per group. Sometimes it is possible to add a List control to a macro and have custom actions and make it all work, but it can be challenging. If you look at the weighted average macro that is built in to Alteryx, you will see it working this way, but if you look a little deeper, you will see how challenging it is. In that case it required an Update Raw XML action to do it.

 

In the case of the weighted median, the algorithm is complicated enough that it would be very difficult. Fortunately Alteryx has a mechanism to break any input into groups while it is processing – it is called a batch macro. A batch macro does 2 different things that are fairly different from each other. The first is the split a stream of records into groups and process them one group at a time. The second is to reconfigure the macro at runtime based on data coming in instead of configuration time information. In this case, we only need the former.

 

imageWe can accomplish our task very quickly just by converting the macro to a batch macro. The only issue with this is that consuming it now becomes a little more difficult. Turning it into a batch macro automatically creates a new second input labeled with a ¿ for the control parameters. In this case we have no control parameters, but that input is required. That input is what controls how many groups are going to be run.

 

We can quickly modify our module that consumes this macro to group by (Summarize tool) the records on the grouping field and pass that on to the control input. That gives one record per group. It is also important to go to the Group By tab on the properties of the macro and pick the grouping field.

 

As long as you are not consuming this macro in many places, this is a fine solution, but if you are giving this macro to other users it makes it difficult to consume. One of the luminaries in the world of computer science, David Wheeler, once said “All problems in computer science can be solved by another level of indirection.” This is exactly true in this case, we just need another level of macro. The answer is to create a new macro that presents the simple user interface and deals with the complexity of consuming a batch macro.

 

I started by saving 2 copies of the original macro, called WeightedMedian_Batch and WeightedMedianInner. The inner one I converted to a batch macro, and I also removed the questions and actions related to the field names. The only question left is the percentile. On the outer one, I removed all the tools that did all the work, but I kept the initial select tool. I also needed to keep the numeric up/down question for the percentile as well as creating an action to pass the value on to the inner macro. I added a new dropdown for Grouping field. In this case I check the box for “Include [None]”. I created a new action using the same technique as the older macro renaming the input field in a formula tool.

 

As an exercise for the reader: I allow the user to not select a grouping field (via “Include [None]”) but I am still using the same batch macro. How does it work?

 

The modules can be downloaded here.

 

Thanks for reading,

 

ned.

This post originally appeared at http://inspiringingenuity.net/2013/09/18/alteryx-processing-groups-of-records-in-a-macro/

Comments

Are there any other Alteryx resources re: "Group By" tab on configuring a Batch Macro?

In your example, you use the "Control GroupBy Field", and set the value to "__Macro_Group".

But you've also used a Summarize tool, with the Action "GroupBy" applied to the Field "__Macro_Group"

 

Isn't this redundant? Can we omit the Summarize tool? This example omits the Summarize tool

I also don't understand the difference between "Control GroupBy Field" and "Input 11 GroupBy Field" (aka the Control Input (¿) and the Macro [Data] Input (D)).

 

UPDATE:

Control GroupBy and Data GroupBy is effectively a JOIN.

To clarify my statement above, the other site gives two examples.

Neither examples use the Summarize tool.

The first example processes data (A (4x), B (3x), C(3x)), in batches (A, B) -- just 2 batches

The second example processes data (A (4x), B (3x), C(3x)) in batches ... driven by the data (A (4x), B ( 3x), C( 3x) )

 

I assumed Alteryx would do some DISTINCT'ing on the Control GroupBy field before running the macro (i.e. (A, B, C) -- just 3 batches) but it does not. Thus the macro runs 10 times (4 + 3 + 3 ) = 10

For me the JOIN analogy helps here. 

Group By is effectively INNER JOINING ...

  • some source that will tell it how many times to run, user must configure which field to JOIN (aka Control GroupBy Field)
  • and some data, subsets of which will be processed in each run, user must configurewhich field to JOIN (aka Data GroupBy Field)

 

In other words, 

 

SELECT * FROM Control
INNER JOIN Data
ON Control.[GroupBy Field] = Data.[GroupBy Field]

In this INNER JOIN, SQL will not do any distincting.

Both Alteryx and SQL will produce cartesian results:

That means we get data:

A (4x) joining on data A (4x) for 16 resulting A rows in total.