The strategy is to make the correlation between the Predictor and Reputation as high as possible at every step of the way. So, the variable that correlates the highest with Reputation is the one that will predict Reputation best. So start with it. Then look for the next highest, etc.
Here are the correlations among the variables that are common to Colleges230 and to CollegesAll.
Pearson Product-Moment Correlation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
zRepu |
zAcctRate |
zGradRate |
zTop10% |
zTest75 |
zTest25 |
zRepu |
1.000 |
|
|
|
|
|
zAccRate |
-0.656 |
1.000 |
|
|
|
|
zGradRate |
0.804 |
-0.600 |
1.000 |
|
|
|
zTop10% |
0.746 |
-0.708 |
0.762 |
1.000 |
|
|
zTest75 |
0.812 |
-0.708 |
0.842 |
0.845 |
1.000 |
|
zTest25 |
0.825 |
-0.731 |
0.853 |
0.823 |
0.961 |
1.000 |
Notice that the correlations are the same whether we correlate z-scores or we correlate raw scores. Why? Because the formula for correlation converts every score to a z-score before summing the products, and the z-score of a z-score is itself. Click here for a more complete explanation.
Pearson Product-Moment Correlation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Repu |
AcctRate |
GradRate |
Top10% |
Test75 |
Test25 |
Repu |
1.000 |
|
|
|
|
|
AccRate |
-0.656 |
1.000 |
|
|
|
|
GradRate |
0.804 |
-0.600 |
1.000 |
|
|
|
Top10% |
0.746 |
-0.708 |
0.762 |
1.000 |
|
|
Test75 |
0.812 |
-0.708 |
0.842 |
0.845 |
1.000 |
|
Test25 |
0.825 |
-0.731 |
0.853 |
0.823 |
0.961 |
1.000 |
The next table shows the steps I went through as I constructed my final Predictor formula, trying to make the highest possible correlation between Predictor and Reputation.