2020 : Simulating the Final Ladder Pre-Season

I know the received wisdom is that one should be wary of being too influenced by one’s peers, but here I am, pre-season, putting together projections for the final home and away ladder, mostly because I’m seeing others are doing it, which completely contradicts the views I’ve expressed in previous years about the folly of attempting such a forecasting task because of the huge amounts of uncertainty associated with it. Is that progress or capitulation - you be the judge.

And, while I’m doing things I’ve previously sworn never to do, for this blog post I’m going to compare the results from my usual methodology with those that I get by updating team ratings after each round of results. I’ve previously commented on the inherent illogicality of this practice - the argument being that it’s absurd to update your ratings on the basis of a random outcome that is, by definition, entirely consistent with your original ratings. But, a lot of intelligent people do use this technique and, I admit, it is not without its possible merits, despite its fundamentally illogical basis.

For the purposes of exposition, let me call the two methods the Standard and the Heretical.

METHODOLOGIES

The Standard methodology uses, as its foundation, the most-recent team ratings and venue performance values from MoSH2020, and holds these constant throughout the remainder of the simulated season. It introduces uncertainty by randomly perturbing each team’s offensive and defensive ratings for a given round by an amount proportional to the square-root of the time between now and when that game will be played. These perturbed ratings are then used in a team scoring model similar to the one discussed in this blog post, which first generates the competing teams’ scoring shots as a bivariate Negative Binomial, and then uses two univariate Beta Binomials to simulate the conversion of those scoring shots. The means for the two Beta Binomials (ie the assumed conversion rates for home and for away team) are held constant across the season.

This is the methodology I’ve been using for the past few years here on MoS.

For the Heretical methodology, we proceed round by round, using exactly the same team scoring model but with team ratings and venue performance values (and home and away team conversion rates) that are updated based on the simulated results of previous rounds.

Teams ratings under the Heretical methodology therefore follow a trajectory across the season so that the simulations for Round X within a replicate are necessarily correlated with those for Round X-1, Round X-2, and so on. This is not the case with the Standard methodology for which the perturbations in any round are unrelated to those in any other round.

That, to be fair, is a selling point for the Heretical methodology, but it comes at a price: it results in desperately slow code (at least for someone with my abilities).

Consequently, we have 50,000 simulation replicates for the Standard method, but only 2,500 for the Heretical method, the obvious impact being that the probability estimates from the Heretical method suffer from substantially more sampling variation (about 4.5 times as much).

LADDER FINISHES

Here, firstly, are the projections for teams’ ladder finishes. The results from the Standard method are on the left, and those from the Heretical method on the right.

At a macro level, the results are remarkably similar. When ordered by expected wins, no team’s ranking except Collingwood’s differs by more than a single spot. We do, however, get a little more spread in the range of expected wins under the Heretical method, with the gap between Richmond and Gold Coast about 9.3 wins under this method compared to 6.7 wins under the Standard method.

The probability estimates for each team for making the Top 8 or Top 4, or for finishing as Minor Premiers, are also very similar.

Also, under both methodologies, most teams are assessed as having reasonable chances at finishing in quite a wide range of ladder positions.

TEAM AND POSITION CONCENTRATION

There are a number of ways of measuring how much uncertainty there is in the final ladder, including the Gini measure of concentration, which we’ve used for each of the past few seasons.

One of the challenges with that measure in practice, however, is in its interpretation. We know that a team with a Gini coefficient of 0.8 has a narrower set of likely final finishes than a team with a Gini coefficient of 0.7, but linking that to anything more interpretable is difficult.

So, this year, I’m switching to the Herfindahl-Hirschman Index (HHI) because its inverse has a fairly straightforward interpretation: in the case of the index for a team it can be thought of as the number of ladder positions for which a team is effectively competing, and in the case of the index for a ladder position, it can be thought of as the number of teams effectively competing for that spot.

The HHI is by no means perfect, but I think that its interpretability has a lot going for it.

So, what do we get from the most recent simulation replicates for the two methods? The results appear below, with those from the Standard methodology on the left, and those from the Heretical methodology on the right.

Standard Methodology - 50,000 Replicates

HERETICAL METHODOLOGY - 2,500 REPLICATES

HERETICAL METHODOLOGY - 2,500 REPLICATES

Again the results are clearly very similar, both in terms of how many ladder positions each team is effectively competing for, and how many teams are effectively competing for each position.

Both methods suggest that most teams are effectively competing for between 13 and 17 different ladder positions, and that most ladder positions have effectively between 13 and 17 teams competing for them. The exceptions amongst the teams are Richmond and Gold Coast, and amongst the ladder positions 1st, 17th, and 18th.

It’ll be interesting to see how rapidly these coefficients fall off this season. The trajectories based on the 2019 season simulation replicates (which start at Round 5) are shown below.

GAME IMPORTANCE

One area where we might expect to see some divergence between the methods is in their estimates of game importance (see this blog for details about how these are calculated).

Here, firstly, is the list of the 25 most-important games in terms of their estimated influence on the composition of the finalists, according to the Standard methodology’s simulation replicates.

And here is the list according to the Heretical methodology.

Whilst there are obvious differences - the absolute size of the weighted importance values are much higher for the Heretical method, for example, though that might be caused by the greater sampling variation - as it turns out, 10 games appear among the Top 25 for both methodologies, and, though not shown here, the correlation between the 198 game-by-game weighted importance measures for the two methodologies is +0.81.

Both methodologies also show a tendency to assess games later in the season as being more important, with both methods having 14 of their Top 25 games coming from Rounds 16 to 23, and none from Rounds 1 to 3. The charts below track the two method’s average importance ratings for games on a round-by-round basis, and show a broadly similar trajectory as the season progresses, though with more noise in that for the Heretical methodology.

STANDARD METHODOLOGY

HERETICAL METHODOLOGY

HERETICAL METHODOLOGY

One of the by-products of the Standard approach is that it tends to drag games in the distant future nearer to 50:50 propositions because it introduces larger amount of variability into them, and variability favours the otherwise-would-be underdog. By contrast, we see that the Heretical methodology produces the opposite trajectory.

STANDARD METHODOLOGY

HERETICAL METHODOLOGY

HERETICAL METHODOLOGY

R1 - Ave Fav Prob - Bookmakers.png

Interestingly, the Heretical results do a better job or mimicking reality on this particular aspect, as you can see from the chart of TAB prices for the 2006 to 2019 period at right.

There are at least three potential contributory causes I can think of for this phenomenon:

  • The scheduling is done in such a way that games later in the season are more likely to be mismatches

  • The spread of team abilities tends to widen as the season progresses, which purely by chance increases the relative likelihood of late-season mismatches

  • There are differential incentives for teams at the back-end of a season, which manifests in predictable differences in discretionary effort

I’d suggest that the first of those is unlikely, the second plausible, and the third very plausible, despite protestations to the contrary.

While undertaking the analysis for this blog, and on a somewhat related note, I was curious about the probabilities attached to return contests when teams meet twice in the same season. I thought that regression to the mean of team abilities, and the fact that the return match will (usually) be played at the home venue of the other team, might lead to a relatively large number of swaps in favouritism, but the data does not bear that out at all.

On the contrary, in over 70% of cases, the team that was favoured in the initial contest was also favoured in the return match later in that same season. Of the 27% where there was a swap 22% saw the home team favourite in both.

SUMMARY AND CONCLUSION

We’ve seen that the outputs of the Standard and Heretical methods are, generally, very similar - much more similar than I assumed going into this analysis.

Based on that similarity:

  • The apparent timidity of the results from the Standard methodology now seems far more reasonable to me. Maybe it’s a reasonable probabilistic estimate that most teams could plausibly finish in any of 13 or more spots on the ladder

  • Also, teams’ finals, Top 4 and Minor Premiership chances would appear to be more a function of opinions about relative team strengths, venues effects, and the fixture, than a function of the simulation methodology

  • I’m heartened that the game importance analysis looks to be genuinely finding something fundamental about the differential importance of games, and, likewise isn’t mostly an artefact of the simulation methodology

  • I’m happy that the Heretical methodology (at least as implemented by me) doesn’t offer any materially significant advantages over the Standard methodology, and certainly none that can offset its slowness

Over the course of the season, assuming there is one, I’ll compare the outputs of the two methodologies again, to see if anything might change my mind.