Grand Final History: A Look at Ladder Positions

Across the 111 Grand Finals in VFL/AFL history - excluding the two replays - only 18 of them, or about 1-in-6, has seen the team finishing 1st on the home-and-away ladder play the team finishing 3rd.

This year, of course, will be the nineteenth.

Far more common, as you'd expect, has been a matchup between the teams from 1st and 2nd on the ladder. This pairing accounts for 56 Grand Finals, which is a smidgeon over half, and has been so frequent partly because of the benefits accorded to teams finishing in these positions by the various finals systems that have been in use, and partly no doubt because these two teams have tended to be the best two teams.

2010 - Grand Final Results by Ladder Position.png

In the 18 Grand Finals to date that have involved the teams from 1st and 3rd, the minor premier has an 11-7 record, which represents a 61% success rate. This is only slightly better than the minor premiers' record against teams coming 2nd, which is 33-23 or about 59%.

Overall, the minor premiers have missed only 13 of the Grand Finals and have won 62% of those they've been in.

By comparison, teams finishing 2nd have appeared in 68 Grand Finals (61%) and won 44% of them. In only 12 of those 68 appearances have they faced a team from lower on the ladder; their record for these games is 7-5, or 58%.

Teams from 3rd and 4th positions have each made about the same number of appearances, winning a spot about 1 year in 4. Whilst their rates of appearance are very similar, their success rates are vastly different, with teams from 3rd winning 46% of the Grand Finals they've made, and those from 4th winning only 27% of them.

That means that teams from 3rd have a better record than teams from 2nd, largely because teams from 3rd have faced teams other than the minor premier in 25% of their Grand Final appearances whereas teams from 2nd have found themselves in this situation for only 18% of their Grand Final appearances.

Ladder positions 5 and 6 have provided only 6 Grand Finalists between them, and only 2 Flags. Surprisingly, both wins have been against minor premiers - in 1998, when 5th-placed Adelaide beat North Melbourne, and in 1900 when 6th-placed Melbourne defeated Fitzroy. (Note that the finals systems have, especially in the early days of footy, been fairly complex, so not all 6ths are created equal.)

One conclusion I'd draw from the table above is that ladder position is important, but only mildly so, in predicting the winner of the Grand Final. For example, only 69 of the 111 Grand Finals, or about 62%, have been won by the team finishing higher on the ladder.

It turns out that ladder position - or, more correctly, the difference in ladder position between the two grand finalists - is also a very poor predictor of the margin in the Grand Final.

2010 - Grand Final Results by Ladder Position - Chart.png

This chart shows that there is a slight increase in the difference between the expected number of points that the higher-placed team will score relative to the lower-placed team as the gap in their respective ladder positions increases, but it's only half a goal per ladder position.

What's more, this difference explains only about half of one percentage of the variability in that margin.

Perhaps, I thought, more recent history would show a stronger link between ladder position difference and margin.

2010 - Grand Final Results by Ladder Position - Chart 2.png

Quite the contrary, it transpires. Looking just at the last 20 years, an increase in the difference of 1 ladder position has been worth only 1.7 points in increased expected margin.

Come the Grand Final, it seems, some of your pedigree follows you onto the park, but much of it wanders off for a good bark and a long lie down.

Adding Some Spline to Your Models

Creating the recent blog on predicting the Grand Final margin based on the difference in the teams' MARS Ratings set me off once again down the path of building simple models to predict game margin.

It usually doesn't take much.

Firstly, here's a simple linear model using MARS Ratings differences that repeats what I did for that recent blog post but uses every game since 1999, not just Grand Finals.

2010 - MARS Ratings vs Score Difference.png

It suggests that you can predict game margins - from the viewpoint of the home team - by completing the following steps:

  1. subtract the away team's MARS Rating from the home team's MARS Rating
  2. multiply this difference by 0.736
  3. add 9.871 to the result you get in 2.

One interesting feature of this model is that it suggests that home ground advantage is worth about 10 points.

The R-squared number that appears on the chart tells you that this model explains 21.1% of the variability is game margins.

You might recall we've found previously that we can do better than this by using the home team's victory probability implied by its head-to-head price.

2010 - Bookie Probability vs Score Difference.png

This model says that you can predict the home team margin by multiplying its implicit probability by 105.4 and then subtracting 48.27. It explains 22.3% of the observed variability in game margins, or a little over 1% more than we can explain with the simple model based on MARS Ratings.

With this model we can obtain another estimate of the home team advantage by forecasting the margin with a home team probability of 50%. That gives an estimate of 4.4 points, which is much smaller than we obtained with the MARS-based model earlier.

(EDIT: On reflection, I should have been clearer about the relative interpretation of this estimate of home ground advantage in comparison to that from the MARS Rating based model above. They're not measuring the same thing.

The earlier estimate of about 10 points is a more natural estimate of home ground advantage. It's an estimate of how many more points a home team can be expected to score than an away team of equal quality based on MARS Rating, since the MARS Rating of a team for a particular game does not include any allowance for whether or not it's playing at home or away.

In comparison, this latest estimate of 4.4 points is a measure of the "unexpected" home ground advantage that has historically accrued to home teams, over-and-above the advantage that's already built into the bookie's probabilities. It's a measure of how many more points home teams have scored than away teams when the bookie has rated both teams as even money chances, taking into account the fact that one of the teams is (possibly) at home.

It's entirely possible that the true home ground advantage is about 10 points and that, historically, the bookie has priced only about 5 or 6 points into the head-to-head prices, leaving the excess of 4.4 that we're seeing. In fact, this is, if memory serves me, consistent with earlier analyses that suggested home teams have been receiving an unwarranted benefit of about 2 points per game on line betting.

Which, again, is why MAFL wagers on home teams.)

Perhaps we can transform the probability variable and explain even more of the variability in game margins.

In another earlier blog we found that the handicap a team received could be explained by using what's called the logit transformation of the bookie's probability, which is ln(Prob/(1-Prob)).

Let's try that.

2010 - Bookie Probability vs Score Difference - Logit Form.png

We do see some improvement in the fit, but it's only another 0.2% to 22.5%. Once again we can estimate home ground advantage by evaluating this model with a probability of 50%. That gives us 4.4 points, the same as we obtained with the previous bookie-probability based model.

A quick model-fitting analysis of the data in Eureqa gives us one more transformation to try: exp(Prob). Here's how that works out:

2010 - Bookie Probability vs Score Difference - Exp Form.png

We explain another 0.1% of the variability with this model as we inch our way to 22.6%. With this model the estimated home-ground advantage is 2.6 points, which is the lowest we've seen so far.

If you look closely at the first model we built using bookie probabilities you'll notice that there seems to be more points above the fitted line than below it for probabilities from somewhere around 60% onwards.

Statistically, there are various ways that we could deal with this, one of which is by using Multivariate Adaptive Regression Splines.

(The algorithm in R - the statistical package that I use for most of my analysis - with which I created my MARS models is called earth since, for legal reasons, it can't be called MARS. There is, however, another R package that also creates MARS models, albeit in a different format. The maintainer of the earth package couldn't resist the temptation not to call the function that converts from one model format to the other mars.to.earth. Nice.)

The benefit that MARS models bring us is the ability to incorporate 'kinks' in the model and to let the data determine how many such kinks to incorporate and where to place them.

Running earth on the bookie probability and margin data gives the following model:

Predicted Margin = 20.7799 + if(Prob > 0.6898155, 162.37738 x (Prob - 0.6898155),0) + if(Prob < 0.6898155, -91.86478 x (0.6898155 - Prob),0)

This is a model with one kink at a probability of around 69%, and it does a slightly better job at explaining the variability in game margins: it gives us an R-squared of 22.7%.

When you overlay it on the actual data, it looks like this.

2010 - Bookie Probability vs Score Difference - MARS.png

You can see the model's distinctive kink in the diagram, by virtue of which it seems to do a better job of dissecting the data for games with higher probabilities.

It's hard to keep all of these models based on bookie probability in our head, so let's bring them together by charting their predictions for a range of bookie probabilities.

2010 - Bookie Probability vs Score Difference - Predictions.png

For probabilities between about 30% and 70%, which approximately equates to prices in the $1.35 to $3.15 range, all four models give roughly the same margin prediction for a given bookie probability. They differ, however, outside that range of probabilities, by up to 10-15 points. Since only about 37% of games have bookie probabilities in this range, none of the models is penalised too heavily for producing errant margin forecasts for these probability values.

So far then, the best model we've produced has used only bookie probability and a MARS modelling approach.

Let's finish by adding the other MARS back into the equation - my MARS Ratings, which bear no resemblance to the MARS algorithm, and just happen to share a name. A bit like John Howard and John Howard.

This gives us the following model:

Predicted Margin = 14.487934 + if(Prob > 0.6898155, 78.090701 x (Prob - 0.6898155),0) + if(Prob < 0.6898155, -75.579198 x (0.6898155 - Prob),0) + if(MARS_Diff < -7.29, 0, 0.399591 x (MARS_Diff + 7.29)

The model described by this equation is kinked with respect to bookie probability in much the same way as the previous model. There's a single kink located at the same probability, though the slope to the left and right of the kink is smaller in this latest model.

There's also a kink for the MARS Rating variable (which I've called MARS_Diff here), but it's a kink of a different kind. For MARS Ratings differences below -7.29 Ratings points - that is, where the home team is rated 7.29 Ratings points or more below the away team - the contribution of the Ratings difference to the predicted margin is 0. Then, for every 1 Rating point increase in the difference above -7.29, the predicted margin goes up by about 0.4 points.

This final model, which I think can still legitimately be called a simple one, has an R-squared of 23.5%. That's a further increase of 0.8%, which can loosely be thought of as the contribution of MARS Ratings to the explanation of game margins over and above that which can be explained by the bookie's probability assessment of the home team's chances.

Pies v Saints: An Initial Prediction

During the week I'm sure I'll have a number of attempts at predicting the result of the Grand Final - after all, the more predictions you make about the same event, the better your chances of generating at least one that's remembered for its accuracy, long after the remainder have faded from memory.

In this brief blog the entrails I'll be contemplating come from a review of the relationship between Grand Finalists' MARS Ratings and the eventual result for each of the 10 most recent Grand Finals.

Firstly, here's the data:

2010 - Grand Finals and MARS Ratings.png

In seven of the last 10 Grand Finals the team with the higher MARS Rating has prevailed. You can glean this from the fact that the rightmost column contains only three negative values indicating that the team with the higher MARS Rating scored fewer points in the Grand Final than the team with the lower MARS Rating.

What this table also reveals is that:

  • Collingwood are the highest-rated Grand Finalist since Geelong in 2007 (and we all remember how that Grand Final turned out)
  • St Kilda are the lowest-rated Grand Finalist since Port Adelaide in 2007 (refer previous parenthetic comment)
  • Only one of the three 'upset' victories from the last decade, where upset is defined based on MARS Ratings, was associated with a higher MARS Rating differential. This was the Hawks' victory over Geelong in 2008 when the Hawks' MARS Rating was almost 29 points less than the Cats'

From the raw data alone it's difficult to determine if there's much of a relationship between the Grand Finalists' MARS Ratings and their eventual result. Much better to use a chart:

2010 - Grand Finals and MARS Ratings - Model.png

The dots each represent a single Grand Final and the line is the best fitting linear relationship between the difference in MARS Ratings and the eventual Grand Final score difference. As well as showing the line, I've also included the equation that describes it, which tells us that the best linear predictor of the Grand Final margin is that the team with the higher MARS Rating will win by a margin equal to about 1.06 times the difference in the teams' MARS Ratings less a bit under 1 point.

For this year's Grand Final that suggests that Collingwood will win by 1.062 x 26.1 - 0.952, which is just under 27 points. (I've included this in gray in the table above.)

One measure of the predictive power of the equation I've used here is the proportion of variability in Grand Final margins that it's explained historically. The R-squared of 0.172 tells us that this proportion is about 17%, which is comforting without being compelling.

We can also use a model fitted to the last 10 Grand Finals to create what are called confidence intervals for the final result. For example, we can say that there's a 50% chance that the result of the Grand Final will be in the range spanning a 5-point loss for the Pies to a 59-point win, which demonstrates just how difficult it is to create precise predictions when you've only 10 data points to play with.

Just Because You're Stable, Doesn't Mean You're Normal

As so many traders discovered to their individual and often, regrettably, our collective cost over the past few years, betting against longshots, deliberately or implicitly, can be a very lucrative gig until an event you thought was a once-in-a-virtually-never affair crops up a couple of times in a week. And then a few more times again after that.

To put a footballing context on the topic, let's imagine that a friend puts the following proposition bet to you: if none of the first 100 home-and-away games next season includes one with a handicap-adjusted margin (HAM) for the home team of -150 or less he'll pay you $100; if there is one or more games with a HAM of -150 or less, however, you pay him $10,000.

For clarity, by "handicap-adjusted margin" I mean the number that you get if you subtract the away team's score from the home team's score and then add the home team's handicap. So, for example, if the home team was a 10.5 point favourite but lost 100-75, then the handicap adjusted margin would be 75-100-10.5, or -35.5 points.

A First Assessment

At first blush, does the bet seem fair?

We might start by relying on the availability heuristic and ask ourselves how often we can recall a game that might have produced a HAM of -150 or less. To make that a tad more tangible, how often can you recall a team losing by more than 150 points when it was roughly an equal favourite or by, say, 175 points when it was a 25-point underdog?

Almost never, I'd venture. So, offering 100/1 odds about this outcome occurring once or more in 100 games probably seems attractive.

Ahem ... the data?

Maybe you're a little more empirical than that and you'd like to know something about the history of HAMs. Well, since 2006, which is a period covering just under 1,000 games and that spans the entire extent - the whole hog, if you will - of my HAM data, there's never been a HAM under -150.

One game produced a -143.5 HAM; the next lowest after that was -113.5. Clearly then, the HAM of -143.5 was an outlier, and we'd need to see another couple of scoring shots on top of that effort in order to crack the -150 mark. That seems unlikely.

In short, we've never witnessed a HAM of -150 or less in about 1,000 games. On that basis, the bet's still looking good.

But didn't you once tell me that HAMs were Normal?

Before we commit ourselves to the bet, let's consider what else we know about HAMs.

Previously, I've claimed that HAMs seemed to follow a normal distribution and, in fact, the HAM data comfortably passes the Kolmogorov-Smirnov test of Normality (one of the few statistical tests I can think of that shares at least part of its name with the founder of a distillery).

Now technically the HAM data's passing this test means only that we can't reject the null hypothesis that it follows a Normal distribution, not that we can positively assert that it does. But given the ubiquity of the Normal distribution, that's enough prima facie evidence to proceed down this path of enquiry.

To do that we need to calculate a couple of summary statistics for the HAM data. Firstly, we need to calculate the mean, which is +2.32 points, and then we need to calculate the standard deviation, which is 36.97 points. A HAM of -150 therefore represents an event approximately 4.12 standard deviations from the mean.

If HAMs are Normal, that's certainly a once-in-a-very-long-time event. Specifically, it's an event we should expect to see only about every 52,788 games, which, to put it in some context, is almost exactly 300 times the length of the 2010 home-and-away season.

With a numerical estimate of the likelihood of seeing one such game we can proceed to calculate the likelihood of seeing one or more such game within the span of 100 games. The calculation is 1-(1-1/52,788)^100 or 0.19%, which is about 525/1 odds. At those odds you should expect to pay out that $10,000 about 1 time in 526, and collect that $100 on the 525 other occasions, which gives you an expected profit of $80.81 every time you take the bet.

That still looks like a good deal.

Does my tail look fat in this?

This latest estimate carries all the trappings of statistically soundness, but it does hinge on the faith we're putting in that 1 in 52,788 estimate, which, in turn hinges on our faith that HAMs are Normal. In the current instance this faith needs to hold not just in the range of HAMs that we see for most games - somewhere in the -30 to +30 range - but way out in the arctic regions of the distribution rarely seen by man, the part of the distribution that is technically called the 'tails'.

There are a variety of phenomena that can be perfectly adequately modelled by a Normal distribution for most of their range - financial returns are a good example - but that exhibit what are called 'fat tails', which means that extreme values occur more often than we would expect if the phenomenon faithfully followed a Normal distribution across its entire range of potential values. For most purposes 'fat tails' are statistically vestigial in their effect - they're an irrelevance. But when you're worried about extreme events, as we are in our proposition bet, they matter a great deal.

A class of distributions that don't get a lot of press - probably because the branding committee that named them clearly had no idea - but that are ideal for modelling data that might have fat tails are the Stable Distributions. They include the Normal Distribution as a special case - Normal by name, but abnormal within its family.

If we fit (using Maximum Likelihood Estimation if you're curious) a Stable Distribution to the HAM data we find that the best fit corresponds to a distribution that's almost Normal, but isn't quite. The apparently small difference in the distributional assumption - so small that I abandoned any hope of illustrating the difference with a chart - makes a huge difference in our estimate of the probability of losing the bet. Using the best fitted Stable Distribution, we'd now expect to see a HAM of -150 or lower about 1 game in every 1,578 which makes the likelihood of paying out that $10,000 about 7%.

Suddenly, our seemingly attractive wager has a -$607 expectation.

Since we almost saw - if that makes any sense - a HAM of -150 in our sample of under 1,000 games, there's some intuitive appeal in an estimate that's only a bit smaller than 1 in 1,000 and not a lot smaller, which we obtained when we used the Normal approximation.

Is there any practically robust way to decide whether HAMs truly follow a Normal distribution or a Stable Distribution? Given the sample that we have, not in the part of the distribution that matters to us in this instance: the tails. We'd need a sample many times larger than the one we have in order to estimate the true probability to an acceptably high level of certainty, and by then would we still trust what we'd learned from games that were decades, possibly centuries old?

Is There a Lesson in There Somewhere?

The issue here, and what inspired me to write this blog, is the oft-neglected truism - an observation that I've read and heard Nassim Taleb of "Black Swan" fame make on a number of occasions - that rare events are, well, rare, and so estimating their likelihood is inherently difficult and, if you've a significant interest in the outcome, financially or otherwise dangerous.

For many very rare events we simply don't have sufficiently large or lengthy datasets on which to base robust probability estimates for those events. Even where we do have large datasets we still need to justify a belief that the past can serve as a reasonable indicator of the future.

What if, for example, the Gold Coast team prove to be particularly awful next year and get thumped regularly and mercilessly by teams of the Cats' and the Pies' pedigrees? How good would you feel than about betting against a -150 HAM?

So when some group or other tells you that a potential catastrophe is a 1-in-100,000 year event, ask them what empirical basis they have for claiming this. And don't bet too much on the fact that they're right.

Coast-to-Coast Blowouts: Who's Responsible and When Do They Strike?

Previously, I created a Game Typology for home-and-away fixtures and then went on to use that typology to characterise whole seasons and eras.

In this blog we'll use that typology to investigate the winning and losing tendencies of individual teams and to consider how the mix of different game types varies as the home-and-away season progresses.

First, let's look at the game type profile of each team's victories and losses in season 2010.

2010 - Game Type by Team 2010.png

Five teams made a habit of recording Coast-to-Coast Comfortably victories this season - Carlton, Collingwood, Geelong, Sydney and the Western Bulldogs - all of them finalists, and all of them winning in this fashion at least 5 times during the season.

Two other finalists, Hawthorn and the Saints, were masters of the Coast-to-Coast Nail-Biter. They, along with Port Adelaide, registered four or more of this type of win.

Of the six other game types there were only two that any single team recorded on 4 occasions. The Roos managed four Quarter 2 Press Light victories, and Geelong had four wins categorised as Quarter 3 Press victories.

Looking next at loss typology, we find six teams specialising in Coast-to-Coast Comfortably losses. One of them is Carlton, who also appeared on the list of teams specialising in wins of this variety, reinforcing the point that I made in an earlier blog about the Blues' fate often being determined in 2010 by their 1st quarter performance.

The other teams on the list of frequent Coast-to-Coast Comfortably losers are, unsurprisingly, those from positions 13 through 16 on the final ladder, and the Roos. They finished 9th on the ladder but recorded a paltry 87.4 percentage, this the logical consequence of all those Coast-to-Coast Comfortably losses.

Collingwood and Hawthorn each managed four losses labelled Coast-to-Coast Nail-Biters, and West Coast lost four encounters that were Quarter 2 Press Lights, and four more that were 2nd-Half Revivals where they weren't doing the reviving.

With only 22 games to consider for each team it's hard to get much of a read on general tendencies. So let's increase the sample by an order of magnitude and go back over the previous 10 seasons.

2010 - Game Type by Team 2001-2010.png

Adelaide's wins have come disproportionately often from presses in the 1st or 2nd quarters and relatively rarely from 2nd-Half Revivals or Coast-to-Coast results. They've had more than their expected share of losses of type Q2 Press Light, but less than their share of Q1 Press and Coast-to-Coast losses. In particular, they've suffered few Coast-to-Coast Blowout losses.

Brisbane have recorded an excess of Coast-to-Coast Comfortably and Blowout victories and less Q1 Press, Q3 Press and Coast-to-Coast Nail-Biters than might be expected. No game type has featured disproportionately more often amongst their losses, but they have had relatively few Q2 Press and Q3 Press losses.

Carlton has specialised in the Q2 Press victory type and has, relatively speaking, shunned Q3 Press and Coast-to-Coast Blowout victories. Their losses also include a disportionately high number of Q2 Press losses, which suggests that, over the broader time horizon of a decade, Carlton's fate has been more about how they've performed in the 2nd term. Carlton have also suffered a disproportionately high share of Coast-to-Coast Blowouts - which is I suppose what a Q2 Press loss might become if it gets ugly - yet have racked up fewer than the expected number of Coast-to-Coast Nail-Biters and Coast-to-Coast Comfortablys. If you're going to lose Coast-to-Coast, might as well make it a big one.

Collingwood's victories have been disproportionately often 2nd-Half Revivals or Coast-to-Coast Blowouts and not Q1 Presses or Coast-to-Coast Nail-Biters. Their pattern of losses has been partly a mirror image of their pattern of wins, with a preponderance of Q1 Presses and Coast-to-Coast Nail-Biters and a scarcity of 2nd-Half Revivals. They've also, however, had few losses that were Q2 or Q3 Presses or that were Coast-to-Coast Comfortablys.

Wins for Essendon have been Q1 Presses or Coast-to-Coast Nail-Biters unexpectedly often, but have been Q2 Press Lights or 2nd-Half Revivals significantly less often than for the average team. The only game type overrepresented amongst their losses has been the Coast-to-Coast Comfortably type, while Coast-to-Coast Blowouts, Q1 Presses and, especially, Q2 Presses have been signficantly underrepresented.

Fremantle's had a penchant for leaving their runs late. Amongst their victories, Q3 Presses and 2nd-Half Revivals occur more often than for the average team, while Coast-to-Coast Blowouts are relatively rare. Their losses also have a disproportionately high showing of 2nd-Half Revivals and an underrepresentation of Coast-to-Coast Blowouts and Coast-to-Coast Nail-Biters. It's fair to say that Freo don't do Coast-to-Coast results.

Geelong have tended to either dominate throughout a game or to leave their surge until later. Their victories are disproportionately of the Coast-to-Coast Blowout and Q3 Press varieties and are less likely to be Q2 Presses (Regular or Light) or 2nd-Half Revivals. Losses have been Q2 Press Lights more often than expected, and Q1 Presses, Q3 Presses or Coast-to-Coast Nail-Biters less often than expected.

Hawthorn have won with Q2 Press Lights disproportionately often, but have recorded 2nd-Half Revivals relatively infrequently and Q2 Presses very infrequently. Q2 Press Lights are also overrepresented amongst their losses, while Q2 Presses and Coast-to-Coast Nail-Biters appear less often than would be expected.

The Roos specialise in Coast-to-Coast Nail-Biter and Q2 Press Light victories and tend to avoid Q2 and Q3 Presses, as well as Coast-to-Coast Comfortably and Blowout victories. Losses have come disproportionately from the Q3 Press bucket and relatively rarely from the Q2 Press (Regular or Light) categories. The Roos generally make their supporters wait until late in the game to find out how it's going to end.

Melbourne heavily favour the Q2 Press Light style of victory and have tended to avoid any of the Coast-to-Coast varieties, especially the Blowout variant. They have, however, suffered more than their share of Coast-to-Coast Comfortably losses, but less than their share of Coast-to-Coast Blowout and Q2 Press Light losses.

Port Adelaide's pattern of victories has been a bit like Geelong's. They too have won disproportionately often via Q3 Presses or Coast-to-Coast Blowouts and their wins have been underrepresented in the Q2 Press Light category. They've also been particularly prone to Q2 and Q3 Press losses, but not to Q1 Presses or 2nd-Half Revivals.

Richmond wins have been disproportionately 2nd-Half Revivals or Coast-to-Coast Nail-Biters, and rarely Q1 or Q3 Presses. Their losses have been Coast-to-Coast Blowouts disproportionately often, but Coast-to-Coast Nail-Biters and Q2 Press Lights relatively less often than expected.

St Kilda have been masters of the foot-to-the-floor style of victory. They're overrepresented amongst Q1 and Q2 Presses, as well as Coast-to-Coast Blowouts, and underrepresented amongst Q3 Presses and Coast-to-Coast Comfortablys. Their losses include more Coast-to-Coast Nail-Biters than the average team, and fewer Q1 and Q3 Presses, and 2nd-Half Revivals.

Sydney's loss profile almost mirrors the average team's with the sole exception being a relative abundance of Q3 Presses. Their profile of losses, however, differs significantly from the average and shows an excess of Q1 Presses, 2nd-Half Revivals and Coast-to-Coast Nail-Biters, a relative scarcity of Q3 Presses and Coast-to-Coast Comfortablys, and a virtual absence of Coast-to-Coast Blowouts.

West Coast victories have come disproportionately as Q2 Press Lights and have rarely been of any other of the Press varieties. In particular, Q2 Presses have been relatively rare. Their losses have all too often been Coast-to-Coast blowouts or Q2 Presses, and have come as Coast-to-Coast Nail-Biters relatively infrequently.

The Western Bulldogs have won with Coast-to-Coast Comfortablys far more often than the average team, and with the other two varieties of Coast-to-Coast victories far less often. Their profile of losses mirrors that of the average team excepting that Q1 Presses are somewhat underrepresented.

We move now from associating teams with various game types to associating rounds of the season with various game types.

You might wonder, as I did, whether different parts of the season tend to produce a greater or lesser proportion of games of particular types. Do we, for example, see more Coast-to-Coast Blowouts early in the season when teams are still establishing routines and disciplines, or later on in the season when teams with no chance meet teams vying for preferred finals berths?

2010 - Game Type by Round 2001-2010.png

For this chart, I've divided the seasons from 2001 to 2010 into rough quadrants, each spanning 5 or 6 rounds.

The Coast-to-Coast Comfortably game type occurs most often in the early rounds of the season, then falls away a little through the next two quadrants before spiking a little in the run up to the finals.

The pattern for the Coast-to-Coast Nail-Biter game type is almost the exact opposite. It's relatively rare early in the season and becomes more prevalent as the season progresses through its middle stages, before tapering off in the final quadrant.

Coast-to-Coast Blowouts occur relatively infrequently during the first half of the season, but then blossom, like weeds, in the second half, especially during the last 5 rounds when they reach near-plague proportions.

Quarter 1 and Quarter 2 Presses occur with similar frequencies across the season, though they both show up slightly more often as the season progresses. Quarter 2 Press Lights, however, predominate in the first 5 rounds of the season and then decline in frequency across rounds 6 to 16 before tapering dramatically in the season's final quadrant.

Quarter 3 Presses occur least often in the early rounds, show a mild spike in Rounds 6 to 11, and then taper off in frequency across the remainder of the season. 2nd-Half Revivals show a broadly similar pattern.

2010: Just How Different Was It?

Last season I looked at Grand Final Typology. In this blog I'll start by presenting a similar typology for home-and-away games.

In creating the typology I used the same clustering technique that I used for Grand Finals - what's called Partitioning Around Medoids, or PAM - and I used similar data. Each of the 13,144 home-and-away season games was characterised by four numbers: the winning team's lead at quarter time, at half-time, at three-quarter time, and at full time.

With these four numbers we can calculate a measure of distance between any pair of games and then use the matrix of all these distances to form clusters or types of games.

After a lot of toing, froing, re-toing anf re-froing, I settled on a typology of 8 game types:

2010 - Types of Home and Away Game.png

Typically, in the Quarter 1 Press game type, the eventual winning team "presses" in the first term and leads by about 4 goals at quarter-time. At each subsequent change and at the final siren, the winning team typically leads by a little less than the margin it established at quarter-time. Generally the final margin is about about 3 goals. This game type occurs about 8% of the time.

In a Quarter 2 Press game type the press is deferred, and the eventual winning team typically trails by a little over a goal at quarter-time but surges in the second term to lead by four-and-a-half goals at the main break. They then cruise in the third term and extend their lead by a little in the fourth and ultimately win quite comfortably, by about six and a half goals. About 7% of all home-and-away games are of this type.

The Quarter 2 Press Light game type is similar to a Quarter 2 Press game type, but the surge in the second term is not as great, so the eventual winning team leads at half-time by only about 2 goals. In the second half of a Quarter 2 Press Light game the winning team provides no assurances for its supporters and continues to lead narrowly at three-quarter time and at the final siren. This is one of the two most common game types, and describes almost 1 in 5 contests.

Quarter 3 Press games are broadly similar to Quarter 1 Press games up until half-time, though the eventual winning team typically has a smaller lead at that point in a Quarter 3 Press game type. The surge comes in the third term where the winners typically stretch their advantage to around 7 goals and then preserve this margin until the final siren. Games of this type comprise about 10% of home-and-away fixtures.

2nd-Half Revival games are particularly closely fought in the first two terms with the game's eventual losers typically having slightly the better of it. The eventual winning team typically trails by less than a goal at quarter-time and at half-time before establishing about a 3-goal lead at the final change. This lead is then preserved until the final siren. This game type occurs about 13% of the time.

A Coast-to-Coast Nail-Biter is the game type that's the most fun to watch - provided it doesn't involve your team, especially if your team's on the losing end of one of these contests. In this game type the same team typically leads at every change, but by less than a goal to a goal and a half. Across history, this game type has made up about one game in six.

The Coast-to-Coast Comfortably game type is fun to watch as a supporter when it's your team generating the comfort. Teams that win these games typically lead by about two and a half goals at quarter-time, four and a half goals at half-time, six goals at three-quarter time, and seven and a half goals at the final siren. This is another common game type - expect to see it about 1 game in 5 (more often if you're a Geelong or a West Coast fan, though with vastly differing levels of pleasure depending on which of these two you support).

Coast-to-Coast Blowouts are hard to love and not much fun to watch for any but the most partial observer. They start in the manner of a Coast-to-Coast Comfortably game, with the eventual winner leading by about 2 goals at quarter time. This lead is extended to six and a half goals by half-time - at which point the word "contest" no longer applies - and then further extended in each of the remaining quarters. The final margin in a game of this type is typically around 14 goals and it is the least common of all game types. Throughout history, about one contest in 14 has been spoiled by being of this type.

Unfortunately, in more recent history the spoilage rate has been higher, as you can see in the following chart (for the purposes of which I've grouped the history of the AFL into eras each of 12 seasons, excepting the most recent era, which contains only 6 seasons. I've also shown the profile of results by game type for season 2010 alone).

2010 - Profile of Game Types by Era.png

The pies in the bottom-most row show the progressive growth in the Coast-to-Coast Blowout commencing around the 1969-1980 era and reaching its apex in the 1981-1992 era where it described about 12% of games.

In the two most-recent eras we've seen a smaller proportion of Coast-to-Coast Blowouts, but they've still occurred at historically high rates of about 8-10%.

We've also witnessed a proliferation of Coast-to-Coast Comfortably and Coast-to-Coast Nail-Biter games in this same period, not least of which in the current season where these game type descriptions attached to about 27% and 18% of contests respectively.

In total, almost 50% of the games this season were Coast-to-Coast contests - that's about 8 percentage points higher than the historical average.

Of the five non Coast-to-Coast game types, three - Quarter 2 Press, Quarter 3 Press and 2nd-half Revival - occurred at about their historical rates this season, while Quarter 1 Press and Quarter 2 Press Light game typesboth occurred at about 75-80% of their historical rates.

The proportion of games of each type in a season can be thought of as a signature of that season. being numeric, they provide a ready basis on which to measure how much one season is more or less like another. In fact, using a technique called principal components analysis we can use each season's signature to plot that season in two-dimensional space (using the first two principal components).

Here's what we get:

2010 - Home and Away Season Similarity.png

I've circled the point labelled "2010", which represents the current season. The further away is the label for another season, the more different is that season's profile of game types in comparison to 2010's profile.

So, for example, 2009, 1999 and 2005 are all seasons that were quite similar to 2010, and 1924, 1916 and 1958 are all seasons that were quite different. The table below provides the profile for each of the seasons just listed; you can judge the similarity for yourself.

2010 - Seasons Similar to 2010.png

Signatures can also be created for eras and these signatures used to represent the profile of game results from each era. If you do this using the eras as I've defined them, you get the chart shown below.

One way to interpret this chart is that there have been 3 super-eras in VFL/AFL history, the first spanning the seasons from 1897 to 1920, the second from 1921-1980, and the third from 1981-2010. In this latter era we seem to be returning to the profiles of the earliest eras, which was a time when 50% or more of all results were Coast-to-Coast game types.

2010 - Home and Away Era Similarity.png

A Line Betting Enigma

The TAB Sportsbet bookmaker is, as you know, a man to be revered and feared in equal measure. Historically, his head-to-head prices have been so exquisitely well-calibrated that I instinctively compare any model I construct with the forecasts he produces. To show that a model historically outperforms leads me to scuttle off to determine what error I've made in constructing the model, what piece of information I've used that, in truth, was only available with the benefit of hindsight.
Read More

Trialling The Super Smart Model

The best way to trial a potential Fund algorithm, I'm beginning to appreciate, is to publish each week the forecasts that it makes. This forces me to work through the mechanics of how it would be used in practice and, importantly, to set down what restrictions should be applied to its wagering - for example should it, like most of the current Funds, only bet on Home Teams, and in which round of the season should it start wagering.
Read More

What Do Bookies Know That We Don't?

Bookies, I think MAFL has comprehensively shown, know a lot about football, but just how much more do they know than what you or I might glean from a careful review of each team's recent results and some other fairly basic knowledge about the venues at which games are played?
Read More

Another Day, Another Model

In the previous blog I developed models for predicting victory margins and found that the selection of a 'best' model depended on the criterion used to measure performance.

This blog I'll review the models that we developed and then describe how I created another model, this one designed to predict line betting winners.

The Low Average Margin Predictor

The model that produced the lowest mean absolute prediction error MAPE was constructed by combining the predictions of two other models. One of the constituent models - which I collectively called floating window models - looked only at the victory margins and bookie's home team prices for the last 22 rounds, and the other constituent model looked at the same data but only for the most recent 35 rounds.

On their own neither of these two models produce especially small MAPEs, but optimally combined they produce an overall model with a 28.999 MAPE across seasons 2008 and 2009 (I know that the three decimal places is far more precision than is warranted, but any rounding's going to nudge it up to 29 which just doesn't have the same ability to impress. I consider it my nod to the retailing industry, which persists in believing that price proximity is not perceived linearly and so, for example, that a computer priced at $999 will be thought meaningfully cheaper than one priced at $1,000).

Those optimal weightings were created in the overall model by calculating the linear combination of the underlying models that would have performed best over the most recent 26 weeks of the competition, and then using those weights for the current week's predictions. These weights will change from week to week as one model or the other tends to perform better at predicting victory margins; that is what gives this model its predictive chops.

This low MAPE model henceforth I shall call the Low Average Margin Predictor (or LAMP, for brevity).

The Half Amazing Margin Predictor

Another model we considered produced margin predictions with a very low median absolute prediction error. It was similar to the LAMP but used four rather than two underlying models: the 19-, 36-, 39- and 52-round floating window models.

It boasted a 22.54 point median absolute prediction error over seasons 2008 and 2009, and its predictions have been within 4 goals of the actual victory margin in a tick over 52% of games. What destroys its mean absolute prediction error is its tendency to produce victory margin predictions that are about as close to the actual result as calcium carbonate is to coagulated milk curd. About once every two-and-a-half rounds one of its predictions will prove to be 12 goals or more distant from the actual game result.

Still, its median absolute prediction error is truly remarkable, which in essence means that its predictions are amazing about half the time, so I shall name it the Half Amazing Margin Predictor (or HAMP, for brevity).

In their own highly specialised ways, LAMP and HAMP are impressive but, like left-handed chess players, their particular specialities don't appear to provide them with any exploitable advantage. To be fair, TAB Sportsbet does field markets on victory margins and it might eventually prove that LAMP or HAMP can be used to make money on these markets, but I don't have the historical data to test this now. I do, however, have line market data that enables me to assess LAMP's and HAMP's ability to make money on this market, and they exhibit no such ability. Being good at predicting margins is different from being good at predicting handicap-adjusted margins.

Nonetheless, I'll be publishing LAMP's and HAMP's margin predictions this season.

HELP, I Need Another Model

Well if we want a model that predicts line market winners we really should build a dedicated model for this, and that's what I'll describe next.

The type of model that we'll build is called a binary logit. These can be used to fit a model to any phenomenon that is binary - that is, two-valued - in nature. You could, for example, fit one to model the characteristics of people who do or don't respond to a marketing campaign. In that case, the binary variable is campaign response. You could also, as I'll do here, fit a binary logit to model the relationship between home team price and whether or not the home team wins on line betting.

Fitting and interpreting such models is a bit more complicated than fitting and interpreting models fitted using the ordinary least squares method, which we covered in the previous blog. For this reason I'll not go into the details of the modelling here. Conceptually though all we're doing is fitting an equation that relates the Home team's head-to-head price with its probability of winning on line betting.

For this modelling exercise I have again created 47 floating window models of the sort I've just described, one model that uses price and line betting result data only the last 6 rounds, another that use the same data for the last 7 rounds, and so on up to one that uses data from the last 52 rounds.

Then, as I did in creating HAMP and LAMP, I looked for the combination of floating window models that best predicts winning line bet teams.

The overall model I found to perform best combines 24 of the 47 floating window models - I'll spare you the Lotto-like list of those models' numbers here. In 2008 this model predicted the line betting winner 57% of the time and in 2009 it predicted 64% of such winners. Combined, that gives it a 61% average across the two seasons. I'll call this model the Highly Evolved Line Predictor (or HELP), the 'highly evolved' part of the name in recognition of the fact that it was selected because of its fitness in predicting line betting winners in the environment that prevailed across the 2008 and 2009 seasons.

Whether HELP will thrive in the new environment of the 2010 season will be interesting to watch, as indeed will be the performance of LAMP and HAMP.

In my previous post I drew the distinction between fitting a model and using it to predict the future and explained that a model can be a good fit to existing data but turn out to be a poor predictor. In that context I mentioned the common statistical practice of fitting a model to one set of data and then measuring its predictive ability on a different set.

HAMP, LAMP and HELP are somewhat odd models in this respect. Certainly, when I've used them to predict they're predicting for games that weren't used in the creation of any of their underlying floating window models. So that's a tick.

They are, however, fitted models in that I generated a large number of potential LAMPs, HAMPs and HELPs, each using a different set of the available floating window models, and then selected those models which best predicted the results of the 2008 and 2009 seasons. Accordingly, it could well be that the superior performance of each of these models can be put down to chance, in which case we'll find that their performances in 2010 will drop to far less impressive levels.

We won't know whether or not we're witnessing such a decline until some way into the coming season but in the meantime we can ponder the basis on which we might justify asserting that the models are not mere chimera.

Recall that each of the floating window models use as predictive variables nothing more than the price of the Home team. The convoluted process of combining different floating window models with time-varying weights for each means that, in essence, the predictions of HAMP, LAMP and HELP are all just sophisticated transformations of one number: the Home team price for the relevant game.

So, for HAMP, LAMP and HELP to be considered anything other than statistical flukes it needs to be the case that:

  1. the TAB Sportsbet bookie's Home team prices are reliable indicators of Home teams' victory margins and line betting success
  2. the association between Home team prices and victory margins, and between Home team prices and line betting outcomes varies in a consistent manner over time
  3. HAMP, LAMP and HELP are constructed in such a way as to effectively model these time-varying relationships

On balance I'd have to say that these conditions are unlikely to be met. Absent the experience gained from running these models live during a fresh season then, there's no way I'd be risking money on any of these models.

Many of the algorithms that support MAFL Funds have been developed in much the same way as I've described in this and the previous blog, though each of them is based on more than a single predictive variable and most of them have been shown to be profitable in testing using previous seasons' data and in real-world wagering.

Regardless, a few seasons of profitability doesn't rule out the possibility that any or all of the MAFL Fund algorithms haven't just been extremely lucky.

That's why I'm not retired ...

There Must Be 50 Ways to Build a Model (Reprise)

Okay, this posting is going to be a lot longer and a little more technical than the average MAFL blog (and it's not as if the standard fare around here could be fairly characterised as short and simple).

Anyway, over the years of MAFL, people have asked me about the process of building a statistical model in sufficient number and with such apparent interest that I felt it was time to write a blog about it.

Step one in building a model is, as in life, finding a purpose and the purpose of the model I'll be building for this blog is to predict AFL victory margins, surely about as noble a purpose as a model can aspire to. Step two is deciding on the data that will be used to build that model, a decision heavily influenced by expedience; often it's more a case of 'what have I already got that might be predictive?' rather than 'what will I spend the next 4 weeks of my life trying to source because I've an inkling it might help?'.

Expediently enough, the model I'll be building here will use a single input variable: the TAB Sportsbet price of the home team, generally at noon on Wednesday before the game. I have this data going back to 1999, but I've personally recorded prices only since 2006. The remainder of the data I sourced from a website built to demonstrate the efficacy of the site-owner's subscription-based punting service, which makes me trust this data about as much as I trust on-site testimonials from 'genuine' customers. We'll just be using the data for the seasons 2006 to 2009.

Fitting the Simplest Model

The first statistical model I'll fit to the data is what's called an ordinary least-squares regression - surely a name to cripple the self-esteem of even the most robust modelling technique - and is of the form Predicted Margin = a + (b / Home Team Price).

The ordinary least-squares method chooses a and b to minimise the sum of the (squared) differences between the actual victory margin and that which would be predicted using it and, in this sense, 'fits' the data best of all the possible choices of a and b that we could make.

We've seen the result of fitting this model to the 2006-2009 data in an earlier blog where we saw that it was:

Predicted Margin = -49.17 + 96.31 / Home Team Price

This model fits the data for seasons 2006 to 2009 quite well. The most common measure of how well a model of this type fits is what's called the R-squared and, for this model, it's 0.236, meaning that the model explains a little less than one-quarter of the variability in margins across games.

But this is a difficult measure to which to attach any intuitive meaning. Better perhaps is to know that, on average, the predictions of this model are wrong by 29.3 points per game and that, for one-half of the games it is within 24.1 points of the actual result, and for 27% of the games it is within 12 points.

These results are all very promising but it would be a rookie mistake to start using this model in 2010 with the expectation that it will explain the future as well as it has explained the past. It's quite common for a statistical model to fit existing data well but to forecast as poorly as a surprised psychic ('Jeez, I didn't see that coming!').

Why? Because forecasting and fitting are two very different activities. When we build the model we deliberately make the fit as good as it can be and this can mean that the model we create doesn't faithfully represent the process that created that data. This is known in statistical circles - which, I guess, are only round on average - as 'overfitting' the data and it's one of the many things over which we obsess.

Overfitting is less likely to be a problem for the current model since it has only one variable in it and overfitting is more commonly a disease of multi-variable models, but it's something that it's always wise to check. A bit like checking that you've turned the stove off before you leave home.

Testing the Model

The biggest problem with modelling the future is that it hasn't happened yet (with apologies to whoever I stole or paraphrased that from). In modelling, however, we can create an artificial reality where, as far as our model's concerned, the future hasn't yet happened. We do this by fitting the model to just a part of the data we have, saving some for later as it were.

So, here we could fit the 2006 season's data and use the resulting model to predict the 2007 results. We could then repeat this by fitting a model to the 2007 data only and then use that model to predict the 2008 results, and then do something similar for 2009. Collectively, I'll call the models that I've fitted using this approach "Single Season" models.

Each Single Season model's forecasting ability can be calculated from the difference between the predictions it makes and the results of the games in the subsequent season. If the Single Season models overfit the data then they'll tend to fit the data well but predict the future badly.

The results of fitting and using the Single Season models are as follows:

2010 - Bookie Model Comparisons.png

The first column, for comparative purposes, shows the results for the simple model fitted to the entire data set (ie all of 2006 to 2009), and the next three columns show the results for each of the Single Season models. The final column averages the results for all the Single Season models and provides results that are the most directly comparable with those in the first column.

On balance, our fears of overfitting appear unfounded. The average and median prediction errors are very similar, although the Single Season models are a little worse at making predictions that are within 3 goals of the actual result. Still, the predictions they produce seem good enough.

What Is It Good For?

The Single Season approach looks promising. One way that it might have a practical value is if it can be used to predict the handicap winners of each game at a rate sufficient to turn a profit.

Unfortunately, it can't. In 2007 and 2008 it does slightly better than chance, predicting 51.4% of handicap winners, but in 2009 it predicts only 48.1% of winners. None of these performances is good enough to make money since, at $1.90 you need to tip at better than 52.6% to make money.

In retrospect, this is not entirely surprising. Using a bookie's own head-to-head prices to beat him on the line market would be just too outrageous.

Hmmm. What next then?

Working with Windows

Most data, in a modelling context, has a brief period of relevance that fades and, eventually, expires. In attempting to predict the result of this week's Geelong v Carlton game, for example, it's certainly relevant to know that Geelong beat St Kilda last week and that Carlton lost to Melbourne. It's also probably relevant to know that Geelong beat Carlton when they last played 11 weeks ago, but it's almost certainly irrelevant to know that Carlton beat Collingwood in 2007. Finessing this data relevance envelope by tweaking the weights of different pieces of data depending on their age is one of the black arts of modelling.

All of the models we've constructed so far in this blog have a distinctly black-and-white view of data. Every game in the data set that the model uses is treated equally regardless of whether it pertains to a game played last week, last month, or last season, and every game not in the data set is ignored.

There are a variety of ways to deal with this bipolarity, but the one I'll be using here for the moment is what I call the 'floating window' approach. Using it, a model is always constructed using the most recent X rounds of data. That model is then used to predict for just the current week then rebuilt next week for the subsequent week's action. So, for example, if we built a model with a 6-round floating window then, in looking to predict the results for Round 8 of a given season we'd use the results for Rounds 2 through 7 of that season. The next week we'd use the results for Rounds 3 through 8, and so on. For the early rounds of the season we'd reach back and use last year's results, including finals.

So, next, I've created 47 models using floating windows ranging from 6-round to 52-round. Their performance across seasons 2008 and 2009 is summarised in the following charts.

First let's look at the mean and median APEs:

2010 - Floating Window APE.png

Broadly what we see here is that, in terms of mean APE, larger floating windows are better than smaller ones, but the improvement is minimal from about an 11-round window onwards. The median APE story is quite different. There is a marked minimum with a 9-round floating window, and 8-round and 10-round floating windows also perform well.

Next let's take a look at how often the 47 models produce predictions close to the actual result:

2010 - Floating Window Accuracy.png

The top line charts the percentage of time that the relevant model produces predictions that are 3-goals or less distant from the actual result. The middle line is similarly constructed but for a 2-goal distance, and the bottom line is for a 1-goal distance.

Floating windows in the 8- to 11-round range all perform well on all three metrics, consistent with their strong performance in terms of median APE. The 16-round, 17-round and 18-round floating window models also perform well in terms of frequently producing predictions that are within 2-goals of the actual victory margin.

Next let's look at how often the 47 models produce predictions that are very wrong:

2010 - Floating Window Accuracy 36.png

In this chart, unlike the previous chart, lower is better. Here we again find that larger floating windows are better than smaller ones, but only to a point, the effect plateauing out with floating windows in the 30s

Again though to consider each model's potential punting value we can look at its handicap betting performance.

2010 - Floating Window Line.png

On this measure, only the model with an 11-round floating window seems to have any exploitable potential.

But, like Columbo, we just have one more question to ask of the data ...

Dynamic Weighted Floating Windows

(Warning: This next bit hurts my head too.)

We now have 47 floating window models offering an opinion on the likely outcomes of the games in any round. What if we pooled those opinions? But, not opinions are of equal value, so which opinions should we include and which should we ignore? What if we determined which opinions to pool based on the ability of different subsets of those 47 models to fit the results of, say, the last 26 rounds before the one we're trying to predict? And what if we updated those weights each round based on the latest results?

Okay, I've done all that (and yes it took a while to conceptualise and code, and my first version, previously published here, had an error that caused me to overstate the predictive power of one of the pooled models, but I got there eventually). Here's the APE data again now including a few extra models based on this pooling idea:

2010 - Floating Window APE with Dyn.png

(The dynamic floating window model results are labelled "Dynamic Linear I (22+35)" and "Dynamic Linear II (19+36+39+52)" The numbers in brackets are the Floating Window model forecasts that have been pooled to form the Dynamic Linear model. So, for example, the Dynamic Linear I model pools only the opinions of the Floating Window models based on a 22-round and a 35-round window. It determines how best to weight the opinions of these two Floating Window models by optimising over the past 26 rounds.

I've also shown the results for the Single Season models - they're labelled 'All of Prev Season' - and for a model that always uses all data from the start of 2006 up to but excluding the current round, labelled 'All to Current'.)

The mean APE results suggest that, for this performance metric at least, models with more data tend to perform better than models with less. The best Dynamic Linear model I could find, for all its sophistication still only managed to produce a mean APE 0.05 points per game lower than the simple model that used all the data since the start of 2006, weighting each game equally.

It is another Dynamic Linear model that shoots the lights out on the median APE results, however. The Dynamic Linear model that optimally combines the opinions of 19-, 36-, 39- and 52-round floating windows produces forecasts with a median APE of just 22.54 points per game.

The next couple of charts show that this superior performance stems from this Dynamic Linear model's all-around ability - it isn't best in terms of producing the most APEs under 7 points nor in terms of producing the fewest APEs of 36 points or more.

2010 - Floating Window Accuracy with Dyn.png
2010 - Floating Window Accuracy 36 with Dyn.png

Okay, here's the clincher. Do either of the Dynamic Linear models do much of a job predicting handicap winners?

2010 - Floating Window Line with Dyn.png

Nope. The best models for predicting handicap winners are the 11-round floating window model and the model formed by using all the data since the start of 2006. They each manage to be right just over 53% of the time - a barely exploitable edge.

The Moral So Far ...

What we've seen in these results is consistent with what I've found over the years in modelling the footy. Models tend to be highly specialised, so one that performs well in terms of, say, mean APE, won't perform well in terms of median APE.

Perhaps no surprise then that none of the models we've produced so far have been any good at predicting handicap margin winners. To build such a model we need to start out with that as the explicit modelling goal, and that's a topic for a future blog.

Predicting Margins Using Market Prices and MARS Ratings

Imagine that you allowed me to ask you for just one piece of data about an upcoming AFL game. Armed with that single piece of data I contend that I will predict the margin of that game and, on average, be within 5 goals of the actual margin. Further, one-half of the time I'll be within 4 goals of the final margin and one-third of the time I'll be within 3 goals. What piece of data do you think I am going to ask you for?

I'll ask you for the bookies' price for the home team, true or notional, I'll plug that piece of data into this equation:

Predicted Margin = -49.17 + 96.31 x (1 / Home Team Price)

(A positive margin means that the Home Team is predicted to win, a negative margin that the Away Team is predicted to win. So, at a Home Team price of $1.95 the Home Team is predicted to win; at $1.96 the Away Team is predicted to squeak home.)

Over the period 2006 to 2009 this simple equation has performed as I described in the opening paragraph and explains 23.6% of the variability in the victory margins across games.

Here's a chart showing the performance of this equation across seasons 2006 to 2009.

2010 Predicted v Actual Margin.png

The red line shows the margin predicted by the formula and the crosses show the actual results for each game. You can see that the crosses are fairly well described by the line, though the crosses are dense in the $1.50 to $2.00 range, so here's a chart showing only those games with a home team price of $4 or less.

How extraordinary to find a model so parsimonious yet so predictive. Those bookies do know a thing or two, don't they?

Now what if I was prohibited from asking you for any bookie-related data but, as a trade-off, was allowed two pieces of data rather than one? Well, then I'd be asking you for my MARS Ratings of the teams involved (though quite why you'd have my Ratings and I'd need to ask you for them spoils the narrative a mite).

The equation I'd use then would be the following:

Predicted Margin = -69.79 + 0.779 x MARS Rating of Home Team - 0.702 x MARS Rating of Away Team

Switching from the bookies' brains to my MARS' mindless maths makes surprisingly little difference. Indeed, depending on your criterion, the MARS Model might even be considered superior, your Honour.

The prosecution would point out that the MARS Model explains about 1.5% less of the overall variability in victory margins, but the case for the defence would counter that it predicts margins that are within 6 points of the actual margin over 15% of the time, more than 1.5% more often than the bookies' model does, and would also avow that the MARS model predictions are 6 goals or more different from the actual margin less often than are the predictions from the bookies' model.

So, if you're looking for a model that better fits the entire set of data, then percent of variability explained is your metric and the bookies' model is your winner. If, instead, you want a model that's more often very close to the true margin and less often very distant from it, then the MARS Model rules.

Once again we have a situation where a mathematical model, with no knowledge of player ins and outs, no knowledge of matchups or player form or player scandals, with nothing but a preternatural recollection of previous results, performs at a level around or even above that of an AFL-obsessed market-maker.

A concept often used in modelling is that of information. In the current context we can say that a bookie's home team price contains information about the likely victory margin. We can also say that my MARS ratings have information about likely victory margins too. One interesting question is does the bookie's price have essentially the same information as my MARS ratings or is there some additional information content in their combination?

To find out we fit a model using all three variables - the Home Team price, the Home Team MARS Rating, and the Away Team MARS Rating - and we find that all three variables are statistically significant at the 10% level. On that basis we can claim that all three variables contain some unique information that helps to explain a game's victory margin.

The model we get, which I'll call the Combined Model, is:

Predicted Margin = -115.63 + 67.02 / Home Team Price + 0.31 x MARS Rating of Home Team - 0.22 x MARS Rating of Away Team

A summary of this model and the two we covered earlier appears in the following table:

2010 Bookies v MARS.png

The Combined Model - the one that uses the bookie price and MARS ratings - explains over 24% of the variability in victory margin and has an average absolute prediction error of just 29.2 points. It produces these more accurate predictions not by being very close to the actual margin more often - in fact, it's within 6 points of the actual margin only about 13% of the time - but, instead, by being a long way from the actual margin less often.

Its margin prognostications are sufficiently accurate that, based on them, the winning team on handicap betting is identified a little over 53% of the time. Of course, it's one thing to fit a dataset that well and another thing entirely to convert that performance into profitable forecasts.

The Draw's Unbalanced: So What?

In an earlier blog we looked at how each team had fared in the 2010 draw and assessed the relative difficulty of each team's draw by, somewhat crudely, estimating a (weighted) average MARS of the teams they played. From the point of view of the competition ladder, however, what matters is not the differences in the average MARS rating of the teams played, but how these differences, on a game-by-game basis, translate into expected competition points.
Read More

Losing Does Lead to Winning But Only for Home Teams (and only sometimes)

For reasons that aren't even evident to me, I decided to revisit the issue of "when losing leads to winning", which I looked at a few blogs back.

In that earlier piece no distinction was made between which team - home or away - was doing the losing or the winning. Such a distinction, it turns out, is important in uncovering evidence for the phenomenon in question.

Put simply, there is some statistical evidence across the home-and-away matches from 1980 to 2008 that home teams that trail by between 1 and 4 points at quarter time, or by 1 point at three-quarter time, tend to win more often than they lose. There is no such statistical evidence for away teams.

The table below shows the proportion of times that the home team has won when leading or trailing by the amount shown at quarter time, half time or three-quarter time.

Home_Team_Wins_By_Lead_Short.png

It shows, for example, that home teams that trailed by exactly 5 points at quarter time went on to win 52.5% of such games.

Using standard statistical techniques I've been able to determine, based on the percentages in the table and the number of games underpinning each percentage, how likely it is that the "true" proportion of wins by the home team is greater than 50% for any of the entries in the table for which the home team trails. That analysis, for example, tells us that we can be 99% confident (since the significance level is 1%) that the figure of 57.2% for teams trailing by 4 points at quarter time is statistically above 50%.

(To look for a losing leads to winning phenomenon amongst away teams I've performed a similar analysis on the rows where the home team is ahead and tested whether the proportion of wins by the home team is statistically significantly less than 50%. None of the entries was found to be significant.)

My conclusion then is that, in AFL, it's less likely that being slightly behind is motivational. Instead, it's that the home ground advantage is sufficient for the home team to overcome small quarter time or three-quarter time deficits. It's important to make one other point: though home teams trailing do, in some cases, win more often that they lose, they do so at a rate less than their overall winning rate, which is about 58.5%.

So far we've looked only at narrow leads and small deficits. While we're here and looking at the data in this way, let's broaden the view to consider all leads and deficits.

Home_Team_Wins_By_Lead_Long.png

In this table I've grouped leads and deficits into 5-point bands. This serves to iron out some of the bumps we saw in the earlier, more granular table.

A few things strike me about this table:

  • Home teams can expect to overcome a small quarter time deficit more often than not and need only be level at the half or at three-quarter time in order to have better than even chances of winning. That said, even the smallest of leads for the away team at three-quarter time is enough to shift the away team's chances of victory to about 55%.
  • Apparently small differences have significant implications for the outcome. A late goal in the third term to extend a lead from say 4 to 10 points lifts a team's chances - all else being equal - by 10% points if it's the home team (ie from 64% to 74%) and by an astonishing 16% points if it's the away team (ie from 64% to 80%).
  • A home team that leads by about 2 goals at the half can expect to win 8 times out of 10. An away team with such a lead with a similar lead can expect to win about 7 times out of 10.

Does Losing Lead to Winning?

I was reading an issue of Chance News last night and came across the article When Losing Leads to Winning. In short, the authors of this journal article found that, in 6,300 or so most recent NCAA basketball games, teams that trailed by 1 point at half-time went on to win more games than they lost. This they attribute to "the motivational effects of being slightly behind".

Naturally, I wondered if the same effect existed for footy.

This first chart looks across the entire history of the VFL/AFL.

Leads and Winning - All Seasons.png

The red line charts the percentage of times that a team leading by a given margin at quarter time went on to win the game. You can see that, even at the leftmost extremity of this line, the proportion of victories is above 50%. So, in short, teams with any lead at quarter time have tended to win more than they've lost, and the larger the lead generally the greater proportion they've won. (Note that I've only shown leads from 1 to 40 points.)

Next, the green line charts the same phenomenon but does so instead for half-time leads. It shows the same overall trend but is consistently above the red line reflecting the fact that a lead at half-time is more likely to result in victory than is a lead of the same magnitude at quarter time. Being ahead is important; being ahead later in the game is more so.

Finally, the purple line charts the data for leads at three-quarter time. Once again we find that a given lead at three-quarter time is generally more likely to lead to victory than a similar lead at half-time, though the percentage point difference between the half-time and three-quarter lines is much less than that between the half-time and first quarter lines.

For me, one of the striking features of this chart is how steeply each line rises. A three-goal lead at quarter time has, historically, been enough to win around 75% of games, as has a two-goal lead at half-time or three-quarter time.

Anyway, there's no evidence of losing leading to winning if we consider the entire history of footy. What then if we look only at the period 1980 to 2008 inclusive?

Leads and Winning - 1980 to 2008.png

Now we have some barely significant evidence for a losing leads to winning hypothesis, but only for those teams losing by a point at quarter time (where the red line dips below 50%). Of the 235 teams that have trailed by one point at quarter time, 128 of them or 54.5% have gone on to win. If the true proportion is 50%, the likelihood of obtaining by chance a result of 128 or more wins is about 8.5%, so a statistician would deem that "significant" only if his or her preference was for critical values of 10% rather than the more standard 5%.

There is certainly no evidence for a losing leads to winning effect with respect to half-time or three-quarter time leads.

Before I created this second chart my inkling was that, with the trend to larger scores, larger leads would have been less readily defended, but the chart suggests otherwise. Again we find that a three-goal quarter time lead or a two-goal half-time or three-quarter time lead is good enough to win about 75% of matches.

Not content to abandon my preconception without a fight, I wondered if the period 1980 to 2008 was a little long and that my inkling was specific to more recent seasons. So, I divided up the 112-season history in 8 equal 14-year epochs and created the following table.

Leads and Winning - Table.png

The top block summarises the fates of teams with varying lead sizes, grouped into 5-point bands, across the 8 epochs. For example, teams that led by 1 to 5 points in any game played in the 1897 to 1910 period went on to win 55% of these games. Looking across the row you can see that this proportion has varied little across epochs never straying by more than about 3 percentage points from the all-season average of 54%.

There is some evidence in this first block that teams in the most-recent epoch have been better - not, as I thought, worse - at defending quarter time leads of three goals or more, but the evidence is slight.

Looking next at the second block there's some evidence of the converse - that is, that teams in the most-recent epoch have been poorer at defending leads, especially leads of a goal or more if you adjust for the distorting effect on the all-season average of the first two epochs (during which, for example, a four-goal lead at half-time should have been enough to send the fans to the exits).

In the third and final block there's a little more evidence of recent difficulty in defending leads, but this time it only relates to leads less than two goals at the final change.

All in all I'd have to admit that the evidence for a significant decline in the ability of teams to defend leads is not particularly compelling. Which, of course, is why I build models to predict football results rather than rely on my own inklings ...

Less Than A Goal In It

Last year, 20 games in the home and away season were decided by less than a goal and two teams, Richmond and Sydney were each involved in 5 of them.

Relatively speaking, the Tigers and the Swans fared quite well in these close finishes, each winning three, drawing one and losing just one of the five contests.

Fremantle, on the other hand, had a particularly bad run in close games last years, losing all four of those it played in, which contributed to an altogether forgettable year for the Dockers.

The table below shows each team's record in close games across the previous five seasons.

Close Finishes.png

Surprisingly, perhaps, the Saints head the table with a 71% success rate in close finishes across the period 2004-2008. They've done no worse than 50% in close finishes in any of the previous five seasons, during which they've made three finals appearances.

Next best is West Coast on 69%, a figure that would have been higher but for an 0 and 1 performance last year, which was also the only season in the previous five during which they missed the finals.

Richmond have the next best record, despite missing the finals in all five seasons. They're also the team that has participated in the greatest number of close finishes, racking up 16 in all, one ahead of Sydney, and two ahead of Port.

The foot of the table is occupied by Adelaide, whose 3 and 9 record includes no season with a better than 50% performance. Nonetheless they've made the finals in four of the five years.

Above Adelaide are the Hawks with a 3 and 6 record, though they are 3 and 1 for seasons 2006-2008, which also happen to be the three seasons in which they've made the finals.

So, from what we've seen already, there seems to be some relationship between winning the close games and participating in September's festivities. The last two rows of the table shed some light on this issue and show us that Finalists have a 58% record in close finishes whereas Non-Finalists have only a 41% record.

At first, that 58% figure seems a little low. After all, we know that the teams we're considering are Finalists, so they should as a group win well over 50% of their matches. Indeed, over the five year period they won about 65% of their matches. It seems then that Finalists fare relatively badly in close games compared to their overall record.

However, some of those close finishes must be between teams that both finished in the finals, and the percentage for these games is by necessity 50% (since there's a winner and a loser in each game, or two teams with draws). In fact, of the 69 close finishes in which Finalists appeared, 29 of them were Finalist v Finalist matchups.

When we look instead at those close finishes that pitted a Finalist against a Non-Finalist we find that there were 40 such clashes and that the Finalist prevailed in about 70% of them.

So that all seems as it should be.

Is the Competition Getting More Competitive?

We've talked before about the importance of competitiveness in the AFL and the role that this plays in retaining fans' interest because they can legitimately believe that their team might win this weekend (Melbourne supporters aside).

Last year we looked at a relatively complex measure of competitiveness that was based on the notion that competitive balance should produce competition ladders in which the points are spread across teams rather than accruing disproportionately to just a few. Today I want to look at some much simpler diagnostics based on margins of victory.

Firstly, let's take a look at the average victory margin per game across every season of the VFL/AFL.

Average_Victory_Margin.png

The trend since about the mid 1950s has been increasing average victory margins, though this seems to have been reversed at least a little over the last decade or so. Notwithstanding this reversal, in historical terms, we saw quite high average victory margins in 2008. Indeed, last year's average margin of 35.9 points was the 21st highest of all time.

Looking across the last decade, the lowest average victory margin came in 2002 when it was only 31.7 points, a massive 4 points lower than we saw last year. Post WWII, the lowest average victory margin was 23.2 points in 1957, which was the season in which Melbourne took the minor premiership with 12-1-5 record.

Averages can, of course, be heavily influenced by outliers, in particular by large victories. One alternative measure of the closeness of games that avoids these outliers is the proportion of games that are decided by less than a goal or two. The following chart provides information about such measures. (The purple line shows the percentage of games won by 11 points or fewer and the green line shows the percentage of games won by 5 points or fewer. Both include draws.)

Close_Games.png

Consistent with what we found in the chart of average victory margins we can see here a general trend towards fewer close games since about the mid 1950s. We can also see an increase in the proportion of close games in the last decade.

Again we also find that, in historical terms, the proportion of close games that we're seeing is relatively low. The proportion of games that finished with a margin of 5 points or fewer in 2008 was just 10.8%, which ranks equal 66th (from 112 seasons). The proportion that finished with a margin of 11 points or fewer was just 21.1%, which ranks an even lowlier 83rd.

On balance then I think you'd have to conclude that the AFL competition is not generally getting closer though there are some signs that the situation has been improving in the last decade or so.