Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Average Ignoring Nulls

In the Formula Tool, there is an Average() function which can be used to take the average/mean of multiple columns or expressions. This function treats null values as zeroes. This was a surprise/dissapointment to me as I am used to other applications & systems where nulls are ignored, for example Excel. It would be useful to have either an AverageIgnoreNulls() function or an optional extra parameter to Average() which specifies that nulls should be ignored rather than treated as zeroes.

 

When wishing to average a small number of columns and ignore nulls, a formula can be constructed using Iif(IsNull([Column1]),0,[Column1]) for each column to calculate the total, and Iif(IsNull([Column1]),0,1) for calculating the count. This quickly becomes unwieldy for more than 2 or 3 columns.

13 Comments
tom_montpool
12 - Quasar
PaulRB
8 - Asteroid

Hi Tom, as per the guide, I did a search before posting, came accross that thread and read it hoping for a solution. But the solution presented there is not suitable for the situation I am describing above, if I understood it correctly.

 

It seems to assume that the user needs to take the average of the same column over many rows. I am talking about taking the average of several colunms, considering each row in isolation. Have I misunderstood the suggested solution?

 

In my case, for each row of data, taken in isolation, I am calculating several estimates for the same quantity, using several different algorithms. I then need to take the average of those estimates. An estimate could be zero, and I need to include it when I take the average. However, some of the algorithms may not be able to calculate an estimate, in which case the result is null. When that happens, I need to ignore it when I take the average. Ignoring a value is not the same as treating it as zero. A zero value drags down the average. Ignoring it does not.

 

 

JohnJPS
15 - Aurora

I did it using Alteryx tools but cannot attach a workflow in the ideas section; so here's a screenshot:

 

Capture.PNG

 

Basically, inside the grey box:

  1. add a RecordID
  2. anchor on Record ID and transpose everything else to columns
  3. summarize Group By Record ID and do both a Sum and a CountNonNull
  4. Calculate avg = Sum / Count
  5. Join back to original, on Record ID
  6. Select original Fields and the average

This is also be easy to turn into a standard macro:

Capture1.PNG

 

 

PaulRB
8 - Asteroid

Thanks John, I will have a play with this idea. 

 

However, this is bound to be less efficient than my suggested new/enhanced function, because it temporarily creates several times more data rows to be processed, so my suggestion still has merit, I think.

JohnJPS
15 - Aurora

@PaulRB -- Agreed it's a worthwhile idea; sometimes it's good to have a quick workaround. This is also a great example of something that can be worked into a Formula AddIn, however that takes a little bit of actual C coding.  I did the work today, and will try to get it somewhere convenient (e.g. GitHub) perhaps tomorrow; this will provide the Formulaic approach you desire.

JohnJPS
15 - Aurora

OK, I created the Formula AddIn to do this: this works on the latest version (10.5) but might not work on earlier versions.  Basically just drop the .xml and .dll files contained here (https://github.com/JohnJPS/Alteryx/tree/master/dll-xml) into your Alteryx install dir, subfolder path bin\RuntimeData\FormulaAddin... on my computer it's "C:\Program Files\Alteryx\bin\RuntimeData\FormulaAddIn"... then just restart Alteryx, and your formula tool will now have "AverageNonNull" right next o "Average" under the Math functions.

 

Some caveats here are you still have to type in all the columns... the earlier macro approach you can just use the select tool to click on the desired columns to include. But either way I've been wanting to write a Formula AddIn so this was good practice... (good practice with GitHub as well; the source code is in the "Source" folder also under my Alteryx repository.)  Thanks!

PaulRB
8 - Asteroid

Thanks John, your work on this is much appreciated. Unfortunately I'm not in a position to test it out at the moment. I quickly discovered that in my corporate environment, I don't have write access to the ...Alteryx\bin folder on my PC. I will raise a service request with my IT department, but they may regard my request as a security concern. If and when I am able to test it I'll let you know.

jdunkerley79
ACE Emeritus
ACE Emeritus

 @JohnJPS nice to see someone else using the SDK :)

 

There is a bug in the SDK around NULLs in formula tool. Will only be correct on first row of data.

 

My Formula AddIns have a workaround @Ned helped me come up. 

 

Current release https://github.com/jdunkerley/AlteryxFormulaAddOns/releases/tag/v1.1 has Coalesce, Avg, Count and Sum functions.

Source code on GitHub.

 

Install script will need access to C:\Program Files\Alteryx\bin though.

JohnJPS
15 - Aurora

@jdunkerley79, agreed and thanks fro the tip; I did get some quirky results a lot like what you mention, before noticing the problems went away in version 10.5.  So hopefully that issue is fixed: good reason to move up if you haven't already.

 

 

I'm wanting to move onto the Engine SDK and see how detailed I can get with my own tool creation, however hit a few snags there as well. I think I'll start a discussion to knock around thoughts.

 

jdunkerley79
ACE Emeritus
ACE Emeritus

 @JohnJPS, Ned fixed it on a Webex call with me (thats awesome support for you), but thought was after the code was locked in for 10.5. Basic bug was that it didnt reset the IsNull flag so once a variable was marked as null remained null for all following rows. Will see if can remove that section of code though such a simple workaround may keep for compatibility with old versions.

 

Happy to talk you over the framework I have stuck together. Is C# based but can write a simple addin (e.g. HexBins) taking 1 or more inputs and producing 1 or more outputs in about 30 mins now. DM me if you fancy a webex/hangout on it.