Establishing Metrics for Margin, Total Score and Team Score Predictions
These days, I reckon I know what a good margin forecaster looks like. Any person or algorithm - and I'm still at the point where I think there's a meaningful distinction to be made there - who (that?) can consistently predict margins within 30 points of the actual result is unarguably competent. That benchmark is based on the empirical performances I've seen from others and measured for my own forecasting models across the last decade of analysing football.
I don't though, have the same empirical feel for forecasts of individual team or aggregate scores (despite having written a few, vaguely relevant posts on predicting total scores and "proving" that the variance of total scores must be less than that of margins).
Today then I want to investigate the characteristics of forecast errors when predicting margins, individual team scores, and total scores, using for my forecasts MoSSBODS Team Ratings and estimates of Venue effects. It's true, of course, that MoSSBODS might not produce the best possible forecasts, so the errors we'll be investigating will be partly attributable to the shortcomings and biases in its forecasting methodology, and partly attributable to the inherent randomness in football itself. There is though, no way around that.
Let's start by reviewing the season by season mean absolute errors (MAEs) in MoSSBODS' margin, individual team and aggregate score forecasts. It's a way of measuring by how much - in either direction - forecasts tend to differ from actual results.
The top panel in the chart below tracks the season average MAEs for MoSSBODS' margin forecasts, the thick black line recording the actual averages and the thinner blue line providing a loess estimate of the MAE for a neighbourhood of seasons around each season - essentially a sophisticated moving average. We see that the actual Margin MAE has been falling since about 1980 to its current levels of just under 30 points per game.
Total Score MAEs have been generally falling over this period as well, and now lie at around 25 points per game, as have Team Score MAEs, which now lie at about 20 points per team per game.
As a very rough rule of thumb then, in recent times you'd want to have been predicting Team Scores to within about 3 goals, Aggregate Scores to within about 4 goals, and Margins to within about 5 goals.
Across the history of VFL/AFL, Victory Margins, Team and Total Scores have not been static, however, as the chart below reveals.
One prominent feature of this chart is the fact that Team and Total Scores have, like MAEs, been falling since about 1980 too. That suggests it might be interesting to plot the previous MAEs as percentages of average Total Scores in each season (note that, by definition, average Total Score = 2 x average Team Score, for obvious reasons).
Whilst none of the lines is completely flat, it seems that, roughly speaking, Margin MAEs in a season are about 17% of the average Total Scores in that season, Total Score MAEs are about 15%, and Team Score MAEs are about 11%.
So, we've now forged an empirical sense of mean (absolute) forecast errors. What about the distributions of those errors?
We've suspected from as long ago as 2009 that Margin errors for expert forecasters might inherently be Normal (and we've investigated theoretical reasons why this might be the case). A review of MoSSBODS' errors suggests that they too might be Normally distributed, or nearly so, at least in a large number of seasons.
Of recent seasons, 2015's errors look least Normal of all. Their mean is -0.52 points per game, and their standard deviation 37.7 points. A range spanning one standard deviation either side of the mean encompasses 65% of the distribution, and one spanning two standard deviations encompasses 96%. These are not all that different from the results for a Standard Normal, for which about 68% and 96% of the distribution is spanned by one and two standard deviations respectively. The MoSSBODS errors have a skewness of -0.28 (compared to 0 for a Normal distribution) and a kurtosis of 2.79 (compared to 3 for a Normal distribution).
These results suggest some deviation from Normality in the 2015 MoSSBODS errors, but probably not so much that a Normal distribution might not be used as a useful first-order approximation.
The 2014 MoSSBODS Margin errors are much more Normal in nature, with a mean of -0.07 points per game, a standard deviation of 38.6 points, and one standard deviation from the mean encompassing 71% of the distribution, two standard deviations 94%. Their skewness is -0.03 and their kurtosis 3.20.
MoSSBODS' Total Score forecast errors look a little less Normal in nature in many seasons, often appearing skewed, with a non-zero mean (implying some bias in MoSSBODS' forecasts for that season) and extremely flat. Exactly what statistical distribution might best explain the data for each season is an investigation for another day, as is any attempt to explore what the drivers of each distribution's parameters might be.
Focussing on the errors for last season (2015) we find that the mean is +10.3 points and the standard deviation 29.4 points. One standard deviation from the mean encompasses 71% of the errors, and two standard deviations encompasses 96%.
In this sense, the 2015 errors are at least approximately (non-centred) Normal, a conclusion that is somewhat reinforced by the fact that the skewness of the error distribution is +0.033, and the kurtosis is 3.15.
A similar analysis of the 2014 MoSSBODS Total Score errors (mean +5.6 points, standard deviation = 29.8 points, skewness = +0.119, kurtosis = 3.18) comes to a similar conclusion, although the bimodal nature of the error distribution is apparent.
So, at least for recent seasons, it's probably reasonable to assume that Total Score errors are approximately Normally distributed with a standard deviation of about 30 points. (By way of comparison, recall that the standard deviation of MoSSBODS' Margin errors in 2015 was about 38 points.)
Lastly then, let's look at the distribution of MoSSBODS' Team Score forecast errors.
These look perhaps a little more Normal, more often, though there are still many seasons where that seems an unreasonable assumption.
Looking just at the errors for 2015 we find a mean of +5.2 points, a standard deviation of 23.9 points, and that one standard deviation encompasses 68% of the distribution and two standard deviations encompasses 97%. That would again suggest that an assumption of Normality is not too egregious, though the empirical skewness of +0.236 and kurtosis of 2.67 means that a little caution is required.
Analysing 2014 errors alone produces a similar conclusion, the mean being +2.8 points, the standard deviation 24.4 points, with one standard deviation encompassing 69% of the distribution and two standard deviations encompassing 96%. The skewness is +0.28 and the kurtosis 3.03.
Here too then it's probably a reasonable first-order approximation to assume that Team Score errors, at least for recent seasons, have been Normally distributed with a standard deviation of around 24 points.
SUMMARY CONCLUSION
Our empirical analysis suggests the following reasonable rules of thumb in relation to contemporary forecasting errors for Margins, Total Scores and Team Scores:
- Margin Forecasts: acceptable MAE = 30 points, approximately Normally distributed with a standard deviation of about 37 points
- Total Score Forecasts: acceptable MAE = 25 points, approximately Normally distributed with a standard deviation of about 30 points
- Team Score Forecasts: acceptable MAE = 20 points, approximately Normally distributed with a standard deviation of about 24 points