Matter of Stats

View Original

The Case of the Missing Margins (Are 12 to 24-point Margins Too Rare?)

The analysis used in this blog was originally created as part of a Twitter conversation about the ability of good teams to "win the close ones" (a topic we have investigated before here on MoS - for example in this post and in this one). As a first step in investigating that question, I thought it would be useful to create a cross-tab of historical V/AFL results based on the final margin in each game and the level of pre-game favouritism.

I don't have bookmaker data that I trust going back beyond 2006, so I thought I'd use MoSSBODS to assess the level of pre-game favouritism for the full historical analysis. 

Now, in the cross-tab I'd like to record the number of teams whose results fell into the relevant category, but also how many of the results we might have expected to fall into that category given MoSSBODS' pre-game assessment of the relative team strengths, adjusted for venue.

In this first table I do this by assuming that actual margins follow a Normal Distribution with a mean equal to MoSSBODS' expected margin, and a standard deviation calculated from MoSSBODS' margin errors in the relevant year (which range from about 26.5 points in the very early VFL years to the mid 40s around the 1980s.)

The result is shown below.

We see then that, for example, of the 4,285 teams that have started as more than 4-goal underdogs, 2,703 of them ended up losing by more than 4 goals. That's about 30 teams more than we'd have expected given our assumption about the distribution of actual margins around their expected value.

Now if we focus on the bottom row we can see that:

  • greater than 4-goal losses (and 4-goal wins, because this table is, by construction, symmetric) have been about 2% more common than we'd expect
  • 1 to 11 point losses (and wins) have been about 2.5% more common than we'd expect
  • draws have also been about 2.5% more common than we'd expect
  • 12 to 24 point losses (and wins) have been about 7% less common than we'd expect

Looking up and down the column of data for the 2-4 goal wins (and losses) reveals that the shortfall is largely independent of how strong or weak the pre-game favourite was. The exceptions are games where there were equal favourites - which I've defined as games where MoSSBODS' expected margin was under half a point - and games where there was a better than 4-goal favourite. In those cases we saw roughly as many 2-4 goal victories (losses) as we'd have expected.

There are other features of this table that I think are interesting and probably worth exploring another day, but for today I'm going to stay with the apparently "missing" 2-4 goal margins.

THE MODERN ERA

Okay, maybe the result we've just seen is somehow an artefact of the early parts of VFL history - I can't posit exactly how or why, but let's just say it'd be a less interesting phenomenon if that were the case. So, let's constrain our analysis then to only those seasons from 2000 to 2016.

We now have that:

  • greater than 4-goal losses (and wins) have been about 3% more common than we'd expect
  • 12 to 24 point losses (and wins) have been about 15% less common than we'd expec
  • 1 to 11 point losses (and wins) have been about 7% more common than we'd expect
  • draws have been about 3.5% less common than we'd expect

Far from eliminating the "missing margins" phenomenon, focussing on the modern era exacerbates it. And, now, the phenomenon occurs for games with every level of pre-game favouritism excepting equal favourites.

Curious.

But, there is another way of constructing the distribution of actual margins given MoSSBODS' pre-game views about the strength of the teams, and that is by using the team scoring model I fitted back in 2014, which allows us to create a distribution of margins for every combination of pre-game expected home and away team scoring shot levels. That model produces distributions that are very Normal-like, though they're not Normal and they are also heteroskedastic in that the standard deviation of margins around the expected value tends to increase with the number of scoring shots. 

So, if we use this model to simulate 10,000 games under all combinations of home team and away team scoring shot levels, we can then estimate for every game how likely was each victory margin range. This will allow us to fill in the values in the "Expected" columns of the table. (For these simulations, for simplicity and for some other uninteresting reasons, I used the same parameter values for home and away teams in these simulations, but that makes little difference to the outputs.)

This new approach yields the next table, which suggests that:

  • greater than 4-goal losses (and wins) have been about 4% more common than we'd expect
  • 12 to 24 point losses (and wins) have been about 16% less common than we'd expec
  • 1 to 11 point losses (and wins) have been about 5% more common than we'd expect
  • draws have been about 5% less common than we'd expect

That's not helping at all. Those 2-4 goal margins are still missing.

Using a higher level of correlation (about double, in fact) between home team and away team scoring shots improves the agreement between actual and expected wins (and losses) of more than 4 goals (now showing as an 0.7% excess), reduces the size of the missing 2-4 goal games (now a 12% deficit), but at the expense of driving up the apparent oversupply of 1 to 11 point wins (now an 11% excess).

That change, on balance, appears to mostly drive up the proportion of blowout victories (as you'd expect), and isn't based on the latest empirical data anyway. The correlation between the modelled errors in home team and away team scoring shots for all games in the 2000-2016 period is -0.25 - virtually identical to the -0.24 from the model fitted in 2014.

Still missing then.

BOOKMAKER EXPECTATIONS

MoSSBODS is good, but far from perfect at estimating pre-game team scores and margins, and the TAB Bookmaker (amongst a long list of others) is undeniably better.

Lastly then, let's use the data I have for the TAB from 2006 to 2016 to, firstly, estimate the expected margin in each game. For the most part, I've simply used the negative of the handicap in the line market for this purpose, though I've made adjustments in games where the handicap is under 7 points (because, previously, the TAB would set a minimum handicap of 6.5 points and would adjust the prices on offer, which meant that they didn't really expect a 6.5 point margin) and where the difference between the handicap and that implied by the head-to-head prices I have for the same game is too large.

In total, less than one game in five had its handicap adjusted and, in those games where it was adjusted, the average absolute change was about 2.4 points.

These expected margins have been converted to actual margin distributions in the same way that they were for the MoSSBODS' margin opinions in the very first table -  by assuming that actual margins follow a Normal Distribution with a mean equal to TAB's expected margin, and a standard deviation calculated from the TAB's margin errors in the relevant year (which range from about 33 to 39 points across the 11 seasons.)

This approach suggests that:

  • greater than 4-goal losses (and wins) have been about 2% more common than we'd expect
  • 12 to 24 point losses (and wins) have been about 12.5% less common than we'd expec
  • 1 to 11 point losses (and wins) have been about 8% more common than we'd expect
  • draws have been about 8% more common than we'd expect

A range of approaches then suggest the apparently missing 2 to 4-goal margins might really be missing.

SUMMARY AND CONCLUSION

However we analyse it - whether we use MoSSBODS or the TAB to set estimated margins, and whether, when using MoSSBODS, we assume that actual margins follow a Normal distribution or are best modelled based on an individual team scoring model - historical results seem to have two few margins of between 2 and 4 goals.

There are at least two reasons why this might be the case:

  1. None of the modelling approaches adequately encapsulates - in a statistical sense - the manner in which team scores and margins relate to pre-game expected scores and margins. In other words, neither the Normal distribution nor my team scoring model provides an appropriately accurate statistical surrogate for the real world.
  2. Teams, in real life, are motivated and hence behave in such a way that final margins in the 2 to 4-goal range become underrepresented. We might hypothesise, for example, that teams leading a match feel that only a margin in excess of 4 goals is "comfortable", and so strive to create margins at least slightly higher than this, and that, conversely, teams trailing by 12 to 24 points feel that they are "close enough" to victory to spur further effort, which tends to drive the margin outside this range.
    In other words, a 12 to 24 point margin is a relatively "unstable" outcome, and one which will motivate both teams to move the final margin outside that range.

Readers, of course, might be able to come up with other explanations, which I'd love to hear.

But for now, for me, it remains an unresolved and curious issue ...