Hi everyone,
This is both a sharing and a question 🙂
I need to calculate the correlation coefficients between columns dynamically because the number and names of the said-columns might change randomly.
When using the standard "Pearson Correlation" tool, the macro will crash every time the names (and/or number) of columns change.
To by-pass this problem, I am using a Python bloc and use a basic "dataframe.corr()" to get my correlation matrix.
This solves the problem (and it might help some of us facing the same issue!) but I was wondering if there was a better/proper way to do so?
In addition, the standard Alteryx tool is providing - as an output - a full matrix:
| FieldName | X4 | X5 | X6 |
| X4 | 1 | -0.494204753335428 | 1 |
| X5 | -0.494204753335428 | 1 | -0.494204753335428 |
| X6 | 1 | -0.494204753335428 | 1 |
Whereas the Python bloc is removing the "FieldName" column (even if the code output in the Jupyter notebook is a full matrix...)
| X4 | X5 | X6 |
| 1 | -0.494204753335428 | 1 |
| -0.494204753335428 | 1 | -0.494204753335428 |
| 1 | -0.494204753335428 | 1 |
I will be happy to have your thoughts! (the corresponding simplified workflow is attached)
Pierre-Louis
Edit : Correlation Matrix corrected; thanks @chrishaÂ