Hello Alteryx Community!
I am attempting to determine if values in my data-set are outliers. Of course, to do this, I need to calculate standard deviation first.
My data can have occurrences where different customers can purchases the same items, so I created an ID to join the two together (SmithA1 and Brown A1). Grouping by this CustomerID, I need to calculate if $ Error for any given week in a range of weeks is outside of 3 St. Deviations from the norm. If so, I would like to deem the week an outlier.
I have attached an image below, as well as an example of the file I am using. The columns with blue headers are how my data is already set up, and I would like to use Alteryx to calculate the columns with yellow headers. I have attemped to solve this using the Summarize Tool, but am running into a wall.
Thank you in advance!
Solved! Go to Solution.
I attached a workflow that i think may help you with this. I did it two ways - grouping by weeks and another way by grouping item and CustomerItemID. I think you are looking to group by weeks but I added a different way as well. I used the summarize tool to find the standard deviation and joined it back together with the data.
Hope this helps!
Hi @aralbrecht
The solution provided by @bpatel is a good first step, however it's missing a key point. The data values are considered outliers if they lie outside the range bounded by 3 STDevs, on either side of the mean, assuming the distribution isn't skewed.
I started with @bpatel's workflow and added the mean(AVG in this case) into the Summarize tool and changed the final formula to check if [Dollar Error] is outside the 3 STDev range from the mean, resulting in
Note: Your original data was very flat and non of the values counted as outliers. I bulked up the middle range and added some obvious outlier data
Dan
Thank you both for your fast responses! However, I apologize as I forgot to include a key detail in my data-set when originally posting, only realizing this is an issue when studying both of your workflows...
I also have location data that I need to summarize by week (thinking of PivotTable in Excel, include image of logic in second image below).
Essentially, I need to:
Essentially, I am trying to determine which weeks are outliers for each CustomerID, but am running into an issue since there are multiple locations that need summed to determine what he weekly value is.
I have re-attached the file with location data.
Thank you again in advance!
Hi @aralbrecht
I think this is what your looking for. I summarized the weekly results before passing them to the rest of the procedure.
It wasn't clear whether you wanted the outliers by week or for the original records, so it does both. The top branch is by week and the bottom is all records including locations
Dan
Thank you both for you help! I slightly modified the solution to join the results together using an ID created using Customer-Item-Week, as I needed the result from the first stream in the workflow to append to each record in the second stream. I've included and image of it below.
Danilang i have an older verison of Altryx and isnt letting me open this attached file. I think is what i need to results i need. Is there any way to save the file differently ? Thanks Nick
Hi @Nickvv
If you open the .yxmd file that you get after the import process with Notepad, you can change the second line from
<AlteryxDocument yxmdVer="2019.2">
to what ever your versions is. Save it and you be able to open it in Alteryx
Dan
This what im getting when i open in notepad:
Hi @Nickvv
That's the .yxzp file that you downloaded from the community. You need to find the .yxmd file. It'll probably be in a subdirectory called Outliersv2.
Dan