Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Outlier Standard Deviation check for 100 plus data count

prveenk86kumar
7 - Meteor

Hi All, I have outlier check built in excel which i want alteryx to do (i am new to alteryx) so i have 100 plus clients and i need to check on monthly basis if my client has breached 2 SD and if there is breach it highlights in red, could someone help me to do the same in alteryx.

 

excel formula i am currently using is IFERROR((B27-AVERAGE(B$2:B27))/STDEVP(B$2:B27),"")

 

attached the spreadsheet for your reference.

7 REPLIES 7
RishiK
Alteryx
Alteryx

@prveenk86kumar have a look at the attached workflow I have started for you. I am outputting a "Check Outlier" column after calculating the Average and SD. 

In Alteryx you can replicate the IFERROR function but checking if the outcome of the calculation is NULL.

See if this works for your purpose.

prveenk86kumar
7 - Meteor

Hi RishiK, thanks for the response- can we have outlier check part of Std dev colum - i have 100+ columns which i need to run this so scrolling will be little tuff.

 

Also i am getting Null values were there is actual breach any change in the formula is required is it.

 

By any chance can we add colour if breach is more then 2 SD then red if not green

prveenk86kumar
7 - Meteor

Anyone can you help me with the solution

apathetichell
19 - Altair

@prveenk86kumarthis is is doable - but it would be really helpful if you could provide more realistic sample data...

 

Attached is a version with Dynamic Select it finds all numeric columns and looks for those with values outside of 2 stdev.

 

A few notes - do you need stdev of population or of a sample? Have you checked out @jdunkerley79 's work with abacus?

https://jdunkerley.co.uk/2019/04/09/an-overview-of-the-alteryx-abacus-add-in/ ....  normdist function could do some of the work for you.

 

Also - regarding color - alteryx uses color in tables. So if you are rendering your data - totally you can use color for red/green. otherwise not really.

apathetichell
19 - Altair

My workflow does the computation and finds those with the stdev breach - in terms of highlighting you need to make decisions regarding how you want your data to look at the end of your workflow and if you want to use the render tool et al... there also may be a  way to change the color via "DYNAMIC HARD CODING CHAINED APP SORCERY" (rtm) but I'd prefer to not got into that.

prveenk86kumar
7 - Meteor

@apathetichell thanks for the workflow, i have attached sample data basically below is what i am trying to achieve

 

1) Input file is my raw data which has data for 100 clients and i need to pick the values highlighted in yellow and paste into sample SD file month on month - recurring basis meaning every month i need to add data to sample SD file (old data would be there already)

 

2) my alteryx should just calculate SD formula used in excel is -***IFERROR((B27-AVERAGE(B$2:B27))/STDEVP(B$2:B27),"")***

for that month..not for previous month since we would have already calculated previously.

 

3) If there is SD breach of more then 2 then highlight it by red else be it Green.

 

let me know if i am making sense with my  explanation  

apathetichell
19 - Altair

hi @prveenk86kumar,

 

FYI you use stdevp in excel - Alteryx gives you STDEVS which is probably more statistically correct in most cases. Here are some comparable values for a sample of 10000 randomly generated integers in Alteryx and copied to excel:

 

excel:

2893.207867 stdev.p
2893.352538 stdev.s

 

Alteryx:

STDEV: 2,893.352538

 

Alteryx Variance:

8,371,488.910266

 

If you normalize this variance to STDEV of a P by multiplying by (N/(N+1) where N is the total number of observations you can get variance of a population - the square root of which is what you need (using our Alteryx/Excel example) we get:

8370651.8452893.208

 for variance and for stdev.

 

For you if you need to go with stdevp - you can change your formula to reflect it.

 

2) I strongly recommend you look for the values which are more than 2 stdev's' away from mean - same as your formula and same as what I did - except it flags the 2's specifically. I'd throw this into a boolean variable because...

 

3) To change colors you'll need to review rules in the table tool. Since =True is much more straight forward than a > in table formula - stick with the true. Check out this weekly challenge to see how to implement column rules for values...

https://community.alteryx.com/t5/Weekly-Challenge/Challenge-249-Highlights/td-p/688115

 

that leaves me with 1)

If you want colors - you have to use table/render (or perhaps dynamic hard coding sorcery)... When you render you overwrite whatever is there. Like the entire file. Is that always the case? That's always the case using Alteryx to use render. Are there workarounds? maybe. in like python or something.

 

 

Labels