Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Output difference of Association Analysis and Spearman Correlation tools

qqf
5 - Atom

I was using both association analysis and spearman correlation tools to analyze the same variables. I was expecting the same correlation outputs, but they are different. Can anyone help figure out why they are different?

 

Thank you. 

2 REPLIES 2
jamielaird
14 - Magnetar

Hi @qqf 

 

This is very interesting. As far as I can tell the result from the Association Analysis tool is correct and the result from the Spearman Correlation tool is incorrect.

 

I tested this by recalculating the result in this online calculator:

 

https://www.socscistatistics.com/tests/spearman/default2.aspx

 

Looking at the tools themselves, Association Analysis uses the Hmisc package within R to calculate the result, whereas the Spearman Correlation tool uses a combination of Sort, Filter, Join and  Summarize tools. Without a better understanding of the underlying mathematics I can't really say what exactly is incorrect (if indeed anything is) but you should be able to troubleshoot this by working through the tools on the right hand side of the workflow.

 

Screenshot 2019-05-05 at 17.24.33.png

 

If there is an error in the calculation I suspect it's in the Formula tool 29:

 

Screenshot 2019-05-05 at 17.27.27.png

 

If you have a better understanding of the mathematics you might be able to make more sense of this. If you do feel there is an error (and that certainly seems to be the case) I recommend flagging this with the support team via support@alteryx.com so they can look into it further and if necessary release a fix.

 

-----
If I've solved your problem please consider marking this solution as accepted. Thank you!

giscribson
5 - Atom

I was having this same issue. I believe the Spearman Correlation tool is incorrect because the fields of interest are ranked in descending order, as opposed to ascending (as is the definition). This can make a difference in the case of repeated values, as the sorting will not change the order of these, but the assigned rank still assumes the respective sorting. Thus, for repeated values, the difference in ranks (that you later square) will be off by one.

 

Given that there is no prescribed order for repeated data values, and their default order is arbitrary, either method seems fine. Though by definition the tool is currently wrong.

Labels
Top Solution Authors