2021 : Simulating the Final Ladder Pre-Season
Around this time last season I created my first ever pre-season simulation of the final home-and-away ladder. We all saw the enduring value that had, so I thought I’d do it again this year.
Again I’ll be using two methodologies, which I call Standard and Heretical, and that are described below. Hopefully this year we’ll be able to compare and contrast these two approaches for a full 23 rounds.
METHODOLOGIES
The Standard methodology uses, as its foundation, the most-recent team ratings and venue performance values from MoSHBODS, and holds these constant throughout the remainder of the simulated season. It introduces uncertainty by randomly perturbing each team’s offensive and defensive ratings for a given round by an amount proportional to the square-root of the time between now and when that game will be played. These perturbed ratings are then used in a team scoring model similar to the one discussed in this blog post, which first generates the competing teams’ scoring shots as a bivariate Negative Binomial, and then uses two univariate Beta Binomials to simulate the conversion of those scoring shots. The means for the two Beta Binomials (ie the assumed conversion rates for home and for away team) are held constant across the season.
This is the methodology I’ve been using for the past few years here on MoS.
For the Heretical methodology, we proceed round by round, using exactly the same team scoring model but with team ratings and venue performance values (and expected home and away team conversion rates) that are updated based on the simulated results of previous rounds.
Teams ratings under the Heretical methodology therefore follow a trajectory across the season, so that the simulations for Round X within a replicate are necessarily correlated with those for Round X-1, Round X-2, and so on. This is not the case with the Standard methodology for which the perturbations in any round are unrelated to those in any other round.
That, to be fair, is a selling point for the Heretical methodology, but it comes at a price: it results in desperately slow code. (It’s also logically flawed as a methodology, but I’m not going to argue that case again here.)
Consequently, we have 50,000 simulation replicates for the Standard method, but only 2,500 for the Heretical method, the obvious impact being that the probability estimates from the Heretical method suffer from substantially more sampling variation (about 4.5 times as much).
LADDER FINISHES
Here, firstly, are the projections for teams’ ladder finishes. The results from the Standard method are on the left, and those from the Heretical method on the right.
At a macro level, we see as we did last year, that the results are remarkably similar. There is some minor jiggling of the team orderings based on Expected Wins, but no team moves by more than two places, and what jiggling we do see is probably an artefact of the different levels of sampling variation. We also, again we did last year, get a little more spread in the range of Expected Wins under the Heretical method, with the gap between Geelong and North Melbourne about 7.2 wins under this method compared to 5.8 wins under the Standard method.
The probability estimates for each team for making the Top 8 or Top 4, or for finishing as Minor Premiers, are also very similar, and, under both methodologies, most teams are assessed as having reasonable chances at finishing in quite a wide range of ladder positions.
TEAM AND POSITION CONCENTRATION
There are a number of ways of measuring how much uncertainty there is in the final ladder, including the Gini measure of concentration, which we used for seasons prior to 2020.
As I noted last year, one of the challenges with that measure in practice is in its interpretation. We know that a team with a Gini coefficient of 0.8 has a narrower set of likely final finishes than a team with a Gini coefficient of 0.7, but linking that to anything more interpretable is difficult.
So, I’ve switched to using the Herfindahl-Hirschman Index (HHI) because its inverse has a fairly straightforward interpretation: in the case of the index for a team it can be thought of as the number of ladder positions for which a team is effectively competing, and in the case of the index for a ladder position, it can be thought of as the number of teams effectively competing for that spot.
The HHI figures for the most recent simulations replicates appear below, with those from the Standard methodology on the left, and those from the Heretical methodology on the right.
Again the results are clearly very similar, both in terms of how many ladder positions each team is effectively competing for, and how many teams are effectively competing for each position.
Both methods suggest that most teams are effectively competing for between about 12 and 17 different ladder positions, and that most ladder positions have effectively between 12 and 17 teams competing for them. The exceptions amongst the teams are North Melbourne, Geelong, and Richmond, and amongst the ladder positions 1st, 2nd, 17th, and 18th.
GAME IMPORTANCE
Next, let’s take a look at how both methods estimate the importance of each of the 198 games (see this blog for details about how these are calculated).
Here, firstly, is the list of the 25 most-important games in terms of their estimated influence on the composition of the finalists, according to the Standard methodology’s simulation replicates.
And here is the list according to the Heretical methodology.
Whilst there are obvious differences - the absolute size of the weighted importance values are much higher for the Heretical method, for example, though that might be caused by the greater sampling variation - as it turns out, nine games appear among the Top 25 for both methodologies, and, though not shown here, the correlation between the 198 game-by-game weighted importance measures for the two methodologies is +0.73.
Both methodologies show a tendency to assess games later in the season as being more important, though this is more the case for the Standard Method than for the Heretical Method. The Standard Method has 17 of its Top 25 games coming from Rounds 16 to 23, while the Heretical Method has 13 of its Top 25 games coming from Rounds 16 to 23. The chart below, which records the average Weighted Average Importance by home-and-away round number, suggests that this difference might be at least partly attributable to the larger sampling variability of the Heretical Method.
One of the by-products of the Standard approach is that it tends to drag games in the distant future nearer to 50:50 propositions because it introduces larger amount of variability into them, and variability favours the otherwise-would-be underdog. By contrast, we see that the Heretical methodology appears to produce more similar (albeit noisy) average probabilities for favourites in every round.
CONCLUSION
At a high level, both the Standard and Heretical methodologies produce quite similar results for the key team and game simulation metrics, with the methodological differences being largely swamped by the fact that both use the same underlying initial team ratings, venue performance values (and schedule).
In fact, the correlation between the teams’ Expected Wins under the Standard methodology and the teams’ pre-season ratings is +0.99, and between the teams’ Expected Wins under the Heretical methodology and the teams’ pre-season ratings is +0.98.
The two methods do, however, produce some small differences that might have practical significance - for example in the exact probabilities they attach to certain team ladder outcomes. If we assume that bookmakers are the best estimators of these probabilities, then it might be worth comparing the two methodologies’ estimates with those of the TAB bookmaker. Such a comparison reveals that:
The Top 8 probabilities from the Standard methodology are, in aggregate absolute percentage point difference terms, closer to the implicit Top 8 probabilities of the TAB bookmaker
The Top 4 and Minor Premiership probabilities from the Heretical methodology are, in absolute aggregate percentage point difference terms, closer to the implicit probabilities of the TAB bookmaker for those same markets
We can also say that both methodologies are heavily influenced by MoSHBODS’ relatively more bearish views about the Eagles’ prospects compared to the TAB bookmaker, but that’s a conversation for another day.