This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am trying to devise a way to systematically remove variables based on an excessive correlation, The Pearson tool allows me to create an n x n matrix of correlations for all variables, however, there doesn't appear to be an obvious way to remove a correlation pairing based on a threshold value (e.g. remove if >0.8).
I have also attempted to do the same but using Spearman's coefficient but the association analysis outputs a series of info-graphics displaying the relationships and appears to be geared towards data investigation rather than processing. If there is a way to undertake the above problem using Spearman's.
If I understand your question, you want to systematically filter out collinear variables based on a correlation result. Is that right? The challenge is that the Pearson tool only compares on pair of variables, whereas the Association analysis tool, which does multiple pairwise comparisons, only outputs a report, not the data.
Try the Pearson correlation Matrix macro from Laszlo Dobiasz in the Gallery.
With it you can get the correlation values either as a matrix or as a long list. Then you can use a filter to remove whatever variables you want (note that you will have 4 rows for each pair of variables), and join back to your original data.
You correctly understand my intention but there does appear to be an issue with the tool (the solution itself is good): it throws out at error stating 'There were more than 16 records in the source' which I initially interpreted to mean that it was limited to 16 pairwise correlations but looking at the output, it appears to computed them all. Is it safe to ignore this error?