The Ideal Competition: How Many Blowouts and Upsets?

November 19, 2015 Tony Corke

Working on a few recent posts here on the Statistical Analysis journal has made me think a lot more about blowouts (games won by a large margin) and upsets (games won by the team less-favoured to win pre-game), and realise how inter-related is their prevalence. At least, that's an inevitable conclusion of a belief that the result of a game in the AFL can be adequately modelled as a realisation of a Normally distributed random variable.

In today's blog I'll be using this statistical characterisation of the result of a game to quantify, via simulation, the tradeoff between the propensity for a game to finish as an upset or as a blowout. To do that I need values to use in the simulations for the two parameters of the Normal distribution: its mean and its standard deviation.

Empirically, over the past 10 seasons, the average handicap posted by the TAB Sportsbet Bookmaker has been 22 points, which we can think of as the average difference in ability between the competing teams after adjusting for any home ground, interstate travel, or other effects. This is the value that we use for the mean of the Normal distribution, and is the value about which we expect game margins to vary in our theoretical competition. I'll use the simulations to investigate what would happen if we varied this average difference from a low of 18 points to a high of 26 points.

That leaves only the values to use for the standard deviation of game margins to determine. Analyses here on MatterOfStats have found a number of different values for the standard deviation of handicap-adjusted game margins (for example, this post and this, more theoretical post), with 36 points emerging as something of a consensus value from these analyses and subsequent practical experience. For the simulations we'll allow this parameter to vary between 32 and 40 points.

In all then, we perform 81 simulations, each using one of the 9 "ability difference" values (ie the integers from 18 to 26) and one of the 9 standard deviations (ie the integers from 32 to 40). Each simulation comprises 1,000,000 replicates and is used to derive estimates of two statistics:

the proportion of games where the final margin is more than 10 goals ("blowouts")
the proportion of games where the less-favoured team wins ("upsets")

The chart below summarises all 81 simulation scenarios.

The top line, for example, summarises the nine scenarios for which the ability difference was at its lowest (ie 18 points), with the assumed standard deviation of final margins around that difference increasing as we move from left to right. This line suggests that a standard deviation of 32 points and an ability difference of 18 points will, on average, produce games that are upsets - that is, won by the weaker team - about 29% of the time, and blowouts about 10% of the time. Each 1 point increase in the standard deviation lifts the proportion of blowouts by about 0.8% to 0.9% points, but lifts the proportion of upsets by only 0.4% to 0.6% points.

Broadly, these marginal effects as we vary the standard deviation for a given ability difference are the same for all values of ability difference - a fact demonstrated by the parallel nature of the lines in the chart.

So, what does this chart imply?

Well, firstly, it tells us that, if we hold the ability difference fixed, varying the standard deviation of game margins will either increase the proportion of upsets and increase the proportion of blowouts (if we lift the standard deviation), or decrease both (if we lower the standard deviation).

Alternatively, if ability difference is variable we can achieve more upsets and fewer blowouts by reducing it, even with the same standard deviation. Of course, by reducing the average difference in ability between the competing teams we change the nature of upsets, making them less surprising because they're being achieved by a team of higher relative ability.

Reducing the average ability gap could, in practice, be achieved by any measure designed to equalise team abilities such as the draft. Since this gap, as we've used it here, includes home ground and travel effects, it could also be achieved by diminishing the importance of home grounds (say by having teams share home grounds, as some now do) or by improving teams' ability to perform after travelling large distances.

If we're at a point where further reductions in the average ability gap cannot be achieved, then we're left with attempting to manipulate the variability of game margins and will be unable to simultaneously increase the number of upsets and reduce the number of blowouts. Instead, we could only target a combination of blowouts and upsets that we find most appealling. Achieving such an optimum will require either lowering or raising game margin standard deviations from its current value, but will likely prove difficult to engineer, it being something of an "emergent property" of the competition.

There is, however, some weak evidence that the variability of game margins is positively related to total scoring, so one possible approach to lowering the standard deviation of game margins (thereby reducing blowouts and upsets) would be to adopt rules that make scoring more difficult, and one possible approach to lifting the standard deviation (thereby increasing blowouts and upsets) would be to adopt rules that encourage scoring. Empirical evidence suggests that fairly dramatic changes would be necessary to have a less-than-imperceptible influence.

Individual teams, of course, might be capable of altering the variability of their own game results by the extent to which they play controlled, predictable football (lowering the standard deviation) or play free form, unpredictable football (lifting the standard deviation). As we've noted here before on MatterOfStats, the former strategy is advantageous to stronger teams, the latter to weaker teams.

SOME CONCLUDING COMMENTS

For me, this analysis is interesting because it provides a context for the roles that team equalisation policies, home ground arrangements, and rule changes might play in achieving a competition with a more desirable mix of blowouts and upsets. These two metrics are, of course, not the only bases on which fans will judge how enjoyable a season is, but I think they do capture a pair of very important, salient aspects.

Fans, for the most part, like to see upsets often enough that they can rationally believe most teams have a realistic chance of winning most weeks, and they dislike watching contests where the result is determined long before the final siren sounds. It would be interesting to conduct a survey amongst those fans to understand their preferences around these outcome types.