Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Correlation Tool- Filtering Out Excessive Multi Collinearity

Stu1
6 - Meteoroid

Hello All,

 

I am trying to devise a way to systematically remove variables based on an excessive correlation, The Pearson tool allows me to create an n x n matrix of correlations for all variables, however, there doesn't appear to be an obvious way to remove a correlation pairing based on a threshold value (e.g. remove if >0.8).

 

I have also attempted to do the same but using Spearman's coefficient but the association analysis outputs a series of info-graphics displaying the relationships and appears to be geared towards data investigation rather than processing. If there is a way to undertake the above problem using Spearman's.

 

I have attached an example dataset below. 

 

Thanks in advance,

 

Stu

 

 

5 REPLIES 5
Stu1
6 - Meteoroid

Anyone got any ideas on this?

Stu1
6 - Meteoroid

Bump

terry10
11 - Bolide

@Stu1 

 

If I understand your question, you want to systematically filter out collinear variables based on a correlation result. Is that right? The challenge is that the Pearson tool only compares on pair of variables, whereas the Association analysis tool, which does multiple pairwise comparisons, only outputs a report, not the data. 

 

Try the Pearson correlation Matrix macro from Laszlo Dobiasz in the Gallery.

https://gallery.alteryx.com/#!app/Pearson-Correlation-Matrix/5e55ed0e8a933712c0404c30 

With it you can get the correlation values either as a matrix or as a long list.  Then you can use a filter to remove whatever variables you want (note that you will have 4 rows for each pair of variables), and join back to your original data. 

 

I've attached a workflow to get you started.

 

Does this solve your question?

 

 

 

 

Stu1
6 - Meteoroid

Thanks Terry.

 

You correctly understand my intention but there does appear to be an issue with the tool (the solution itself is good): it throws out at error stating 'There were more than 16 records in the source' which I initially interpreted to mean that it was limited to 16 pairwise correlations but looking at the output, it appears to computed them all. Is it safe to ignore this error?

 

Thanks,

 

Stuart

Laszlo_D
8 - Asteroid

Hi Stuart,

 

I am sorry for the late answer, I didn't notice this thread.

Yes, the macro was limited to 16 pairwise correlations. I updated the macro. It now can handle more than 16 variables. Version 3 is available in the gallery or can be downloaded directly from here.

Kind Regards
Laszlo

Labels