Hi,
I have a list of sales data by major US city by week and am trying to find those cities in which sales week over week changes are most correlated. I want to do this with a simple linear regression for each pair of cities). So, I am trying to make a loop that will regress each city on each of the other 49 cities (minus duplicates) to give me the R^2 in a table which I can sort to find the highest R^2.
I have read about using a batch macro, an iterative macro, the step wise tool, the iterative tool, or the dynamic selection tool, and am not sure exactly which tool is best. I would appreciate any help!
EDIT: Including dummy data for clarity:
Week | New York | San Francisco | Los Angeles | Houston | Chicago |
1 | 85% | 10% | 84% | 67% | 14% |
2 | 84% | 52% | 79% | 53% | 79% |
3 | 99% | 27% | 48% | 1% | 50% |
4 | 39% | 75% | 79% | 14% | 79% |
5 | 72% | 62% | 100% | 79% | 41% |
6 | 57% | 97% | 30% | 63% | 89% |
7 | 27% | 64% | 79% | 89% | 86% |
8 | 71% | 34% | 14% | 39% | 26% |
All combinations of regressions:
Y | X |
New York | Chicago |
New York | Houston |
New York | Los Angeles |
New York | San Francisco |
San Francisco | Chicago |
San Francisco | Houston |
San Francisco | Los Angeles |
Los Angeles | Chicago |
Los Angeles | Houston |
Houston | Chicago |
I created the attached macro to loop through regressors, but I can't figure out how to then change the dependent variable and loop for additional regressors.
Thank you,
Trey
Solved! Go to Solution.
I solved this by:
Joining the list of cities on itself,
removing duplicate pairs,
splitting the pairs across two columns
creating a macro that feeds in one column for indep and one column for dep variables