Hi all
I am trying to do a Pearson's correlation analysis to find out how likely my customers are to purchase another one of my products (SKUs) based on the overlap between them. For example, see the table below. You could argue customer C may well buy product 1000001254 as Customer A & B have it as well. In addition to volume, I have other columns but would like volume to be the key factor.
When I do the PCC calculation, it asks me for two numerical values. I have tried duplicating the same file but no luck to have 2 volume columns but its still not working properly (or dont know how I should say!). Thank you!
| Customer | SKU | Volume |
| Customer A | 1000006237 | 516 |
| Customer A | 1000006863 | 315 |
| Customer A | 1000001254 | 326 |
| Customer A | 1000001260 | 385 |
| Customer A | 1000001261 | 815 |
| Customer B | 1000001254 | 125 |
| Customer B | 1000001260 | 355 |
| Customer B | 1000001261 | 555 |
| Customer C | 1000001254 | 295 |
| Customer C | 1000001260 | 852 |
| Customer C | 1000006237 | 755 |