Hi All,
I have a data set where it shows running time of a machine. I want to remove the outliers from the data set, where top 25% and bottom 25% represent the outliers.
I have attached a sample dataset.
Help is much appreciated.
Solved! Go to Solution.
Here is how you can do it.
Workflow:
1. Using sort tool sort data by ascending.
2. Using record id tool to set row id.
3. Using summarize to get max row id(row count).
4. Using formula to calculate 25% and 75% row id.
5. Using append tool to map back 25% and 75% to main data.
6. Using filter to keep data between 25% and 75% row id. This way removing top and bottom 25%.
Hope this helps : )
The summarize tool allows you to calculate percentiles, which you can use in a filter. Otherwise it's very similar to @atcodedog05's solution
Ollie