Hello,
I'm trying to run a pearson correlation, and everything goes well except when my data contains nulls.
My data looks like this (actual data attached):
var1 | var2 | var3 |
5 | 3 | |
6 | 2 | 5 |
7 | 2 | 6 |
9 | ||
5 | 1 | 3 |
6 | 5 | 2 |
When entering this into pearson, the outcome says
FieldName | var1 | var2 | var3 |
var1 | 1 | ||
var2 | 1 | ||
var3 | 1 |
the pearson documentation recommends to use the imputation tool and replace the null values, but it's important that the calculation is identical to the excel calculation.
in Excel, a record is ignored when only one of the two values are null (cf screenshot below)
Is this a setting that I can set to the pearson tool? Or can someone suggest a method to achieve this?
I imagine a batch macro might do the trick, but that sounds kinda sloppy for what I hope is a button that I'm missing or something :) ?
Thanks,
Mikis
Solved! Go to Solution.
Hi @Mikis
Pearson correlation simply doesn't work when there are NULL values. The excel function filters out rows with NULLs and only provides a correlation values for two data sets (variables only).
The only way I can think of replicating this would be using a batch macro - selecting all variable pair combinations one at a time, filtering out rows with NULLs, and running the Pearson correlation.
The other way would be to use Imputation / Data cleansing (replacing NULLS with 0s) before running the correlation but as you correctly pointed out, the correlation value would be different.
Michal
Hi @MichalM
Thanks, I have built what I think you mean and added it in attachment.
The problem is that it's really slow. The previous pearson correlation with 33 variables took less than a second, this thing takes almost 2 minutes (since the batch macro with 33 variables needs to run 33² = 1089 times).
This correlation is just the first step of a whole flow. If this already takes 2 minutes then I'll probably have to ask my users to just correlate in excel and start the flow from there. But if someone can use it, the package attached might help.
Regards,
Mikis