Predicting Total Game Scores Versus Predicting Margins
In the comments section of the previous blog, LT pointed out that Bookmakers seem to be doing a better job this year predicting the sum of the Home Team and Away Team scores than predicting the difference between them. That is, the score he sets in the Under/Over market for the Total Score tends to be closer to the actual final Total Score than is the (negative of) the handicap he sets in the Line Market to the actual final Game Margin.
That's an astute observation, and explaining why it's likely to often be the case provides a neat opportunity for a little statistics.
All we need for our "proof" is to assume that the Home team's and the Away team's scores are random variables - that is, that at least some of the value they assume at game's end is attributable to purely random factors. We need not make any assumptions about the particular statistical distribution of those random factors, just that they are present.
Given that sole assumption it follows that the sum of the Home Team and Away Team scores, the Total Score of the game, is also a random variable, as is their difference, the Game Margin. A couple of standard statistical results apply to the variability of such sums and differences, and these are presented as equations (1) and (2) at right. (For first principles proofs of these identities, see here.)
In words, Equation (1) tells us that the variance of the Total Score is equal to the variance of the Home Team score plus the variance of the Away Team score PLUS twice the covariance between them.
Equation (2) tells us that the variance of the Game Margin is equal to the variance of the Home Team score plus the variance of the Away Team score MINUS twice the covariance between them.
Intuitively, we might expect Home Team and Away Team scores to be negatively correlated because the overachievement of one team relative to its pre-game expectation seems more likely to be associated with underachievement by its opponent. And, indeed, we've shown before that Home Team and Away Team scores have historically been negatively correlated in practice, which means that their covariance has been negative too.
That leads us to Equation (3), which shows that the variance of the Total Score will be less than the variance of the Game Margin whenever the covariance between the Home Team and Away Team scores is negative. And that is, as we've just said, its normal state.
So, to summarise: if Home Team and Away Team scores are negatively correlated random variables then the variance of their sum will be less than the variance of their difference.
Inherently then, Total Scores are less variable than Game Margins.
Nonetheless, it is possible that a forecaster could produce Total Score forecasts with larger average errors than his or her Game Margin forecasts if he or she is more biased or imprecise in forecasting one compared to the other.
However, if he or she first forecasts Home Team scores and Away Team scores separately and independently, then combines these base forecasts to form Total Score and Margin forecasts, the bias and variability of the Total Score and Margin forecasts must be identical and he or she will naturally produce "better" Total Score forecasts than Game Margin forecasts.
All of which leads us back to LT's observant comment. Thanks LT.