cancel
Showing results for
Did you mean:

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

SOLVED

Finding Outliers - where to start?

Bolide

Hi, I am working on a project were I have weights of products shipped. I am trying to identify extreme outliers by product that do not fit into the normal distribution for that products actual weight.

This has 2 purposes; investigate why they are outliers to identify issues and to remove for average weight calculation by product for further downstream needs.

Can someone point me in the right direction/method I should consider so I can setup a marco to process each product to identify outliers so I can remove them? Are there R functions or exsisting alterxy workflows that can easily accomplish?

Thanks,

Anthony

Alteryx Certified Partner

Generally, for normally distributed data, a Z-score or median absolute deviation (MAD) is the quickest and easiest way to identify potential outliers.

Both of these can be calculated easily inside of Alteryx. Try looking for Z > 3 or MAD > 3.5. Again, these are only starting points. Outlier detection is a complicated issue.

http://www.statisticshowto.com/how-to-calculate-a-z-score/

http://www.statisticshowto.com/median-absolute-deviation/

Alteryx Alumni (Retired)

Documentation on what is an "outlier" indicates that it can be rather nebulous in the definition. I've seen it be considered "one and a half times the size of the Interquartile Range" or "greater than 3 standard deviations" (and a few other more complex calculations).

So you would need to decide what you would consider the outlier for your specific data.

That said, once you determine the min/max range for what is "normal", then it is an easy process to append those fields to your data and filter on whatever lies outside of those values.

Quasar

When I taught statistics, I taught my students to plot and visualize the data before doing anything else. So, I'd recommend creating some plots to see what you'd consider outliers. A good software will also let you select data points to investigate further.

You could try using Alteryx's plotting tools in the Data Investigation tab, use R as a stand alone data exploration tool, or use something like Tableau that lets you interact with your data to really dig into it.

Bolide

I started with standard deviation and it is being skewed by some big outliers so I am trying to figure out the best way to deal with them.

Most of the data is a tight grouping and random ones that are way off that i need to flag for review so it might be a combination of sd/average.

I will take a look at some of the options to see what works best.

Thanks,

Anthony

Highlighted
Sr. Community Content Manager

Here's a quick and dirty outlier detection tool using the z-score methodology to help get you started.

Bolide

Neil thanks - that caught my over obvious ones - I will test on some tighter groups but this should get me started.

Thanks,

Anthony

Bolide

Ok, I think the outlier macro will work after adding group by feature. I can now quickly look for outliers by product and test by looking at new average for the false outliers. Seems like this would work well with a iterative macro to cycle through a range and then develop some thresholds to identify borderline outliers.(another day)

I have attached the group by version for future users.

Labels