I was using both association analysis and spearman correlation tools to analyze the same variables. I was expecting the same correlation outputs, but they are different. Can anyone help figure out why they are different?
Thank you.
Solved! Go to Solution.
Hi @qqf
This is very interesting. As far as I can tell the result from the Association Analysis tool is correct and the result from the Spearman Correlation tool is incorrect.
I tested this by recalculating the result in this online calculator:
https://www.socscistatistics.com/tests/spearman/default2.aspx
Looking at the tools themselves, Association Analysis uses the Hmisc package within R to calculate the result, whereas the Spearman Correlation tool uses a combination of Sort, Filter, Join and Summarize tools. Without a better understanding of the underlying mathematics I can't really say what exactly is incorrect (if indeed anything is) but you should be able to troubleshoot this by working through the tools on the right hand side of the workflow.
If there is an error in the calculation I suspect it's in the Formula tool 29:
If you have a better understanding of the mathematics you might be able to make more sense of this. If you do feel there is an error (and that certainly seems to be the case) I recommend flagging this with the support team via support@alteryx.com so they can look into it further and if necessary release a fix.
-----
If I've solved your problem please consider marking this solution as accepted. Thank you!
I was having this same issue. I believe the Spearman Correlation tool is incorrect because the fields of interest are ranked in descending order, as opposed to ascending (as is the definition). This can make a difference in the case of repeated values, as the sorting will not change the order of these, but the assigned rank still assumes the respective sorting. Thus, for repeated values, the difference in ranks (that you later square) will be off by one.
Given that there is no prescribed order for repeated data values, and their default order is arbitrary, either method seems fine. Though by definition the tool is currently wrong.
User | Count |
---|---|
109 | |
89 | |
77 | |
54 | |
40 |