The Benefits of Shrinkage
The other day I was reviewing the forecasts of the modellers on Squiggle, a website that aggregates the opinions of a dozen or so people - including me - who attempt to predict, among other things, the final margins in games of men’s AFL.
In particular I was wondering about how well we’d done in adjusting our forecasts to deal with the fact that, due to COVID and a desire to reduce the strain on players so that more games could be played in a shorter timeframe, games had been reduced in length by 20%.
To perform the analyses I considered what optimal (in hindsight) fixed multiplier could be applied to the predictions of each forecaster in order to minimise his or her mean absolute error (MAE).
The results are shown in the table at right and reveal that most forecasters would enjoy modest MAE improvements by multiplying all of their raw forecasts by some number between 0.6 and 1.2. Interestingly, the majority of forecasters would benefit from using a multiplier less than 1.
Taken at face value, these results would seem to imply that most of the forecasters haven’t discounted their raw forecasts by enough to cater for the shorter game length.
When I Tweeted this result, Max Barry, owner of the Squiggle website, asked an excellent question that leads naturally to a test of that claim: why not perform the same calculations for earlier seasons?
So, I did this for 2019, which showed optimal multipliers under 1 for almost all forecasters.
Did this mean that forecasters were somehow systematically providing forecasts that were too large in an absolute sense? Or, was something else going on?
Might it actually instead be optimal for forecasters to routinely shrink their raw forecasts towards zero and, if so, why?
SIMULATIONS
To investigate that possibility we’ll idealise the situation a little and assume that:
The True Margins for all games in a season are drawn from a Normal distribution with mean 0 and some standard deviation SD_True. If we assume that the TAB Bookmaker’s pre-game handicaps are perfect estimators of True Margins, then we can estimate SD_True as 14.7 for 2020, 19.0 for 2019, and 24.4 for 2018. Roughly speaking then, the SD under shorter quarters has been about 80% of what it was in 2019 although it is, of course, affected by the average difference between team abilities in each game, as well as by the games’ lengths
The Forecast Margins in each game for a particular forecaster are drawn from a Normal distribution with mean equal to the True Margin (ie we assume the forecaster is unbiased) and some standard deviation, SD_Forecasts. If we again assume that the TAB Bookmaker’s pre-game handicaps are perfect estimators of True Margins, then we can estimate SD_Forecasts for MoSSBODS_Marg, MoSHBODS_Marg, and MoSHPlay_Marg and obtain values of about 8 to 10 as being sensible
The Actual Margins in each game are drawn from a Normal distribution with mean equal to the True Margin and some standard deviation SD_Actual. If we, for a third time, assume that the TAB Bookmaker’s pre-game handicaps are perfect estimators of True Margins, then we can estimate SD_Actual. We get 28.7 for 2020, 34.4 for 2019, and 33.3 for 2018. So, this year’s SD_Actual is just under 85% of last season’s.
We proceed by choosing different values of SD_True, SD_Forecasts, and SD_Actual, and investigate what multiplier of the raw forecasts gives us the lowest MAE and lowest RMSE. In this first set of simulations we set SD_True to 20 and SD_Actual to 30, and allow SD_Forecasts to range from 0 to 20 in steps of 5. For each of the five values of SD_Forecasts we run 10,000 replicates of 1,000 game “seasons”, calculating the optimal multiplier for each season and then plotting the resulting density curves.
What we find is that the (mean) optimal multiplier is always 1 or less, and that it moves closer to zero the larger is SD_Forecasts, which you can interpret as the worse the forecaster is (in the sense of being, on average, further away from the true expected margin).
When I Tweeted this result, another of my highly-numerate followers, Darren O’Shaughnessy, quickly suspected that the optimal multiplier was a function of SD_Forecasts and SD_True - specifically, that:
To test this, I ran a further set of simulations (not shown here), this time allowing all three SDs to vary. The results confirmed that the optimal multiplier could, indeed, be obtained via that equation. (For the technically curious, it looks like we have a version of a James-Stein estimator here.)
What this equation tells us is that:
The optimal multiplier is always between 0 and 1
It gets nearer 1 the larger is SD_True relative to SD_Forecast (ie the greater the underlying variability of true margin expectations relative to the variability of the forecaster’s predictions about those true margin expectations). In other words, if the world is more highly variable than our forecasts, don’t shrink our forecasts by much.
It gets nearer 0 the smaller is SD_True relative to SD_Forecast (ie the smaller the underlying variability of true margin expectations relative to the variability of the forecaster’s predictions about those true margin expectations). In other words, if our forecasts are highly variable around a relatively non-varying set of true margin expectations, shrink our forecasts towards zero by quite a lot.
Armed with this equation, I ran one final set of simulations to estimate how large the benefit of applying the correct multiplier would be for different combinations of the three standard deviations.
The results for SD_Actual equal to 30 appear in the table below and at right, and show that:
The improvements are generally small
They increase, for a given SD_Forecast, as SD_True reduces. In other words, for a given “quality” of forecaster, the benefits from getting the right multiplier increase the less variable are expected margins from game to game (ie the more constant is the gap in quality between opponents)
They increase, for a given SD_True, as SD_Forecast increases. In other words, for a given level of variability in expected margins across games, the benefits from getting the right multiplier increase the poorer the “quality” of forecaster
The largest benefits accrue when the variability in expected margins is smallest and the variability in forecast margins about the true margin is greatest, though even here the improvement is less than 3%.
As a final comparison, we look at the same grid of values for SD_Forecast and SD_True, and set SD_Actual equal to 35.
By comparing the results here with those from the table above, we can add that:
In percentage terms, the rewards from using the correct multiplier reduce as the underlying variability in actual results (about their true expectation) increases. Put another way, if on-the-day and random effects are more significant and lead to greater unpredictability in final margins, using the optimal multiplier yields smaller rewards.
Roughly speaking, the percentage reductions are only about three-quarters of what they were for the situation with SD_Actual equal to 30.
PRACTICAL IMPLICATIONS
The implications of this result would be obvious for a forecaster if he or she could:
accurately predict SD_True for an upcoming season
accurately estimate his or her SD_Forecasts
know that he or she produced would produce unbiased forecasts
In that case, applying the optimal multiplier to his or her raw forecasts would produce superior MAE (and RMSE) results. But, in reality, accurate estimates can’t be guaranteed and misestimation is inevitable.
And, there are penalties to misestimating these parameters and therefore using an incorrect multiplier. In the table at right we look at how much (and, in some cases, whether) we improve our MAE by applying a multiplier calculated assuming that SD_True was 30 in a world where, it turns out, SD_True was 25.
The first row provides the results for a world where SD_Forecasts was 2. Had we estimated our multiplier assuming that SD_Forecasts was 1, we’d have enjoyed no improvement in our MAE compared to using a multiplier of 1. Moving further along that row we see that, had we assumed SD_Forecasts was 12, we’d actually be 0.6% worse off in applying our multiplier than if we’d have just used the raw forecasts.
As we move down the table we get the results for different actual values of SD_Forecasts.
Generally speaking, it seems to be the case - at least for the scenarios shown here - that overestimating SD_Forecasts is better than underestimating it unless you think there’s a genuine possibility that SD_Forecasts is very low. By overestimating SD_Forecasts, we obviously miss out on the very best outcomes (as shown in the rightmost column), but we make it more likely that we will obtain some benefit.
The figures in grey in the table give us an idea of the effects of misestimating only SD_True. We see that these are relatively small. Far more deleterious is misestimating SD_Forecasts.
We’d need to do some more simulation of plausible scenarios though before reaching a firm conclusion on what, practically, might the best course of action. And, we’d need also consider the effects of not being unbiased, but that, too, is for another day.
CONCLUSION
It’s plausible that a forecaster, striving to produce forecasts with the smallest possible MAE, should shrink his or her forecasts towards zero with the amount of shrinkage determined by how variable expected margins are across games, and by how variable his or her forecasts are around the true expectations for a given game.