Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Finding Outliers - where to start?

anthony
11 - Bolide

Hi, I am working on a project were I have weights of products shipped. I am trying to identify extreme outliers by product that do not fit into the normal distribution for that products actual weight.

 

This has 2 purposes; investigate why they are outliers to identify issues and to remove for average weight calculation by product for further downstream needs.

 

Can someone point me in the right direction/method I should consider so I can setup a marco to process each product to identify outliers so I can remove them? Are there R functions or exsisting alterxy workflows that can easily accomplish?

 

Thanks,

Anthony

 

 

 

 

 

 

 

 

 

 

10 REPLIES 10
michael_treadwell
ACE Emeritus
ACE Emeritus

Generally, for normally distributed data, a Z-score or median absolute deviation (MAD) is the quickest and easiest way to identify potential outliers.

 

Both of these can be calculated easily inside of Alteryx. Try looking for Z > 3 or MAD > 3.5. Again, these are only starting points. Outlier detection is a complicated issue.

 

For more information:

 

http://www.statisticshowto.com/how-to-calculate-a-z-score/

http://www.statisticshowto.com/median-absolute-deviation/

 

RodL
Alteryx Alumni (Retired)

Documentation on what is an "outlier" indicates that it can be rather nebulous in the definition. I've seen it be considered "one and a half times the size of the Interquartile Range" or "greater than 3 standard deviations" (and a few other more complex calculations).

So you would need to decide what you would consider the outlier for your specific data.

That said, once you determine the min/max range for what is "normal", then it is an easy process to append those fields to your data and filter on whatever lies outside of those values.

 

Philip
12 - Quasar

When I taught statistics, I taught my students to plot and visualize the data before doing anything else. So, I'd recommend creating some plots to see what you'd consider outliers. A good software will also let you select data points to investigate further.

 

You could try using Alteryx's plotting tools in the Data Investigation tab, use R as a stand alone data exploration tool, or use something like Tableau that lets you interact with your data to really dig into it.

anthony
11 - Bolide

I started with standard deviation and it is being skewed by some big outliers so I am trying to figure out the best way to deal with them.

 

Most of the data is a tight grouping and random ones that are way off that i need to flag for review so it might be a combination of sd/average.

 

I will take a look at some of the options to see what works best.

 

Thanks,

Anthony

 

 

NeilR
Alteryx Alumni (Retired)

Here's a quick and dirty outlier detection tool using the z-score methodology to help get you started.

anthony
11 - Bolide

Neil thanks - that caught my over obvious ones - I will test on some tighter groups but this should get me started.

 

Thanks,

Anthony

anthony
11 - Bolide

Ok, I think the outlier macro will work after adding group by feature. I can now quickly look for outliers by product and test by looking at new average for the false outliers. Seems like this would work well with a iterative macro to cycle through a range and then develop some thresholds to identify borderline outliers.(another day)

 

I have attached the group by version for future users.

Number4
8 - Asteroid

Perhaps a silly question, but what is the [#1] supposed to be in the formula tool?

sirelandcbre
6 - Meteoroid

[#1] here is gonna be the user-specified Z score threshold (made possible by connecting the Numeric Up/Down to the Formula tool's Input/Output anchor (magnifying glass/Q). #1 signifies the connection number; you'll see these when connecting interface tools throughout your workflows

Labels