Building my Predictor Equation for Assigning Reputation Scores to Colleges that Don’t Have One

The strategy is to make the correlation between the Predictor and Reputation as high as possible at every step of the way. So, the variable that correlates the highest with Reputation is the one that will predict Reputation best. So start with it. Then look for the next highest, etc.

Here are the correlations among the variables that are common to Colleges230 and to CollegesAll.

 Pearson Product-Moment Correlation zRepu zAcctRate zGradRate zTop10% zTest75 zTest25 zRepu 1.000 zAccRate -0.656 1.000 zGradRate 0.804 -0.600 1.000 zTop10% 0.746 -0.708 0.762 1.000 zTest75 0.812 -0.708 0.842 0.845 1.000 zTest25 0.825 -0.731 0.853 0.823 0.961 1.000

Notice that the correlations are the same whether we correlate z-scores or we correlate raw scores. Why? Because the formula for correlation converts every score to a z-score before summing the products, and the z-score of a z-score is itself. Click here for a more complete explanation.

 Pearson Product-Moment Correlation Repu AcctRate GradRate Top10% Test75 Test25 Repu 1.000 AccRate -0.656 1.000 GradRate 0.804 -0.600 1.000 Top10% 0.746 -0.708 0.762 1.000 Test75 0.812 -0.708 0.842 0.845 1.000 Test25 0.825 -0.731 0.853 0.823 0.961 1.000

The next table shows the steps I went through as I constructed my final Predictor formula, trying to make the highest possible correlation between Predictor and Reputation.