Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
KurtS
Alteryx Alumni (Retired)

The new Charting tool that will be in 6.0 contains a chart type called "Box and Whisker". For those of you who are not familiar with that type of chart, basically it is a statistical type of chart, which represents a "normal distribution" as if it were viewed from above the curve, looking down on it.

 

Here is a part of one of these types of charts:

Five pieces of data are needed for each entry in the chart. They are the minimum, lower quartile, median, upper quartile, and maximum. The "box" part of each entry is the middle 50% of the distribution (with the median marked inside of it), which goes from the lower quartile to the upper quartile. The "whisker" parts represent the remaining 50% that is outside of the middle 50%, marked by the minimum and maximum values for the data.

 

Note that dependant on your data, your minimum or maximum may show up as a point on the chart, instead of with the "whisker" line. In this case, it is considered an "outlier". (See the resulting charts generated by the sample module to view these outliers.) Also, if your data has a very small amount of deviation, the "box" may be rather short, with the minimum and maximum noted only as outliers.

In Alteryx, the Summarize tool will get us the minimum, median, and maximum from a set of numeric data. But, what about the lower and upper quartiles? Well, we can get those by doing a small bit of extra manipulation of the data, and again using the median functionality of the Summarize tool to give us the actual values. Basically, we can think of the lower quartile as the "median" of the lower 50% of the data, and the upper quartile as the "median" of the upper 50%. So, all we need to do is sort the data beforehand, and obtain the median of the two halves of the data.

 

Here is a macro which does exactly that:

 

This macro is setup to take in a variable number of numeric fields, and output one record for each input field, which contains the field name, and the five required parts of data needed for a Box and Whisker chart.

 

After the sorts, the macro takes the first 50% of the records for each sort, grouping them by the field name. All of the Summarize tools also group by the field name, so that we can join all of it back together again once the quartiles have been created.

 

I have included a module package with a sample module which uses the macro, and demonstrates some features of the new Charting tool also.

 

It can be found here.