I have a data set (that I cannot share due to confidentiality) that has 3k part numbers and 29k total data points. I want to get the standard deviation by number. I used the summarize tool to get this data, but the standard deviation doesn't match excel or other calculations. When I just input the data points into a text input file the std dev matches excel. This must mean how it's summarizing by PN isn't matching my expectation.
PN | Data 1 | Data 2 |
1 | -0.75 | -0.65 |
1 | -0.39 | -0.63 |
1 | -0.65 | -0.78 |
1 | -0.76 | -0.69 |
Alteryx STD DEV | .210666 | .138852 |
Excel STD DEV | .172313 | .066521 |
Below is an image of the summarize tool for the Data 2
Hey @Brian_Denso,
There are two Standard Deviation options, I am wondering if you have selected the wrong one?
@Brian_Denso I am not sure why you arent seeing those #s in Alteryx. Using your sample #s above I got the matching Excel (when Excel uses STDEV.S formula) I tested Excel to see if they were being calculated differently from one to the other. However my Alteryx sample matches your Excel STDEV outputs...
@IraWatt
I've selected the Standard Deviation at the top. If I input the data into an Input list, as you've done, I get the correct deviation. However, in my data set the deviation doesn't match. Is it possible that the grouping by PN isn't creating the correct result?
Excel's STDEV.P gives me a value of .149227 and .057609 respectively for the two columns but even that doesnt match what you are seeing above... :(
@Brian_Denso Grouping should be fine, I would do a few sanity checks such as:
- check there are the same number of rows coming into Alteryx as Excel
- remove all empty rows in Alteryx
- SUM the rows in Alteryx and compare with the SUM in Excel
If they don't find anything try copy and pasting the data directly into Alteryx and join it with the data coming in from the data input tool to check the data is identical.
@Brian_Denso -- perhaps while your numbers are being visibly rounded to 2 decimals but are truly "longer"? Hence, when we copy/paste to text input it works exactly like excel?
I guess I know the issue, but still figuring out how to resolve.
Alteryx calculates STDEV.S by default while EXCEL has multiple options like STDEV.P, STDEV.S etc.
So when we do summarize and select standard deviation in Alteryx, we need a way to select STDDEV.P and not use the default one. Other option can be to use the excel formula using python.