The 2017 AFL Draw: Difficulty and Distortion Dissected

I've seen it written that the best blog posts are self-contained. But as this is the third year in a row where I've used essentially the same methodology for analysing the AFL draw for the upcoming season, I'm not going to repeat the methodological details here. Instead, I'll politely refer you to this post from last year, and, probably more relevantly, this one from the year before if you're curious about that kind of thing. Call me lazy - but at least this year you're getting the blog post in October rather than in November or December.

Read More

Classifying Grand Finals (A Reprise)

(This piece originally appeared in the Guardian, and revisits the topic of defining a typology for Grand Finals, which I first looked at in 2009 where I came up with a similar solution, and again in 2014 where I used a fuzzy clustering approach.)

For fans, even casual ones, AFL Grand Finals are special, and each etches its own unique, defining legacy on the collective football memory. 

Read More

Team Ratings and Conversion Rates

A number of blog posts here in the Statistical Analysis portion of the MoS website have reviewed the rates at which teams have converted Scoring Shots into goals - a metric I refer to as the "Conversion Rate".

In this post from 2014 for example, which is probably the post most similar in intent to today's, I used Beta regression to model team conversion rates:

  1. as a function of venue, and the participating teams' pre-game bookmaker odds, venue experience, MARS Ratings, and recent conversion performance. 
  2. as a function of which teams were playing

Both models explained about 2.5 - 3% of the variability in team conversion rates, but the general absence of statistically significant coefficients in the first model meant that only tentative conclusions could be drawn from it. And, whilst some teams had statistically significant coefficients in the second model, its ongoing usefulness was dependent on an assumption that these team-by-team effects would persist across a reasonable portion of the future. We know, however, that teams go through phases of above- and below-average conversion rates, so that assumption seems dubious.

Other analyses have revealed that stronger teams generally convert at higher rates when playing weaker teams, so it's curious that the first model in that 2014 post did not have statistically significant coefficients on the MARS Ratings variable.

Maybe MoSSBODS, which provides separate offensive and defensive ratings, might help.

THE MODEL

For today's analysis we will again be employing a Beta regression (though this time with a logit link and not fitting phi as a function of the covariates), applying it to all games from the period from Round 1 of 2000 to Round 16 of 2016.

We'll use as regressors:

  • A team's pre-game Offensive and Defensive MoSSBODS Ratings
  • Their opponent's pre-game Offensive and Defensive MoSSBODS Ratings
  • The game venue
  • The (local) time of day when the game started
  • The month in which the game was played
  • The attendance at the game

(Note that the attendance and time-of-day data has been sourced from the extraordinary www.afltables.com site.)

Now, in recent conversations I've been having on Twitter and elsewhere people have been positing that:

  • better teams will, on average, create better scoring shot opportunities and so will convert at higher rates than weaker teams. In particular, teams with stronger attacks playing teams with weaker defences should show heightened rates of conversion.
  • dew and/or wet weather will generally depress scoring, partly because it will be harder to create better scoring opportunities in the first place, and also because any opportunity will be harder to convert than it would be from the same part of the ground were the weather more conducive to long and accurate kicking.

What's appealing about using including MoSSBODS ratings as regressors is that they allow us to explicitly consider the first argument above. If that contention is true. we'd expect to see a positive and significant coefficient on a team's own Offensive rating and a negative and significant coefficient on a team's opponent's Defensive rating.

On the second argument, whilst I don't have direct weather data for every game and so cannot reflect the presence or absence of rain, I can proxy for the likelihood of dew in the regression by including the variables related to the time of day that the game started and the month in which it was played.

Looking at the remaining regresors, venue is included based on an earlier analyses that suggested conversion rates varied significantly around the all-ground average for some venues, and attendance is included to test the hypothesis that teams may respond positively or negatively in their conversion behaviour in the presence of larger- or smaller-than-average crowds.

THE RESULTS

Details of the fitted mode appear below.

The logit formulation makes coefficient interpretation slightly tricky. We need firstly to recognise that estimates are relative to a notional "reference game", which for the model as formulated is a game played at the MCG, starting before 4:30pm and played in April.

The intercept coefficient of the model tells us that such a game, played between two teams with MoSSBODS Offensive and Defensive ratings of 0 (ie 'average' teams) would be expected to produce Conversion rates of 53.1% for both teams. We calculate that as 1/(1+exp(-0.126)). 

(Strictly, we should include some value for Attendance in this calculation, but the coefficient is so small that it makes no practical difference in our estimate whether we do or don't.)

Next, let's consider the four coefficients reflecting MoSSBODS ratings variables. We find, as hypothesised, that the coefficient for a team's own Offensive rating is positive and significant, and that for their opponent's Defensive rating is negative and significant.

Their size means that, for example, a team with a +1 Scoring Shot (SS) Offensive rating and a 0 SS Defensive rating playing a team with a 0 SS Defensive and Offensive rating would be expected to convert at 53.3%, which is just 0.2% higher than the rate in the 'reference game'. This is calculated as 1/1(1+exp(0.126+0.008)).

Strong Offensive teams will have ratings of +5 SS or even higher, in which case the estimated conversion rate would rise to just over 65%.

Similarly, a team facing an opponent with a +1 Scoring Shot (SS) Defensive rating and a 0 SS Offensive rating, itself having 0 SS Defensive and Offensive ratings would be expected to convert at 52.8%, which is about 0.3% higher than the rate for the 'reference game'.

The positive and statistically significant coefficient on a team's opponent's Offensive rating is a curious result. It suggests that teams convert at a higher rate themselves when facing an opposition with a stronger Offence.as compared to one with a weaker Offence. That opponent would, of course, be expected to convert at a higher-than-average rate itself, all other things being equal, so perhaps it's the case that teams themselves strive to create better scoring shot opportunities when faced with an Offensively more capable team, looking to convert less promising near-goal opportunities into better ones before taking a shot at goal. 

In any case, the coefficient is only 0.004, about half the size of the coefficient on a team's own Offensive rating, and about one-third the size of that on the team's opponent's Defensive Rating, so the magnitude of the effect is relatively small.

To the venue-based variables then, where we see that three grounds have statistically significant coefficients. In absolute terms, Cazaly's Stadium's is largest, and negative, and we would expect a game played there between two 'average' teams, starting before 4:30pm in April to result in conversion rates of around 46%.

Docklands has the largest positive coefficient and there we would expect a game played between the same two teams at the same time to yield conversion rates of around 56%.

The coefficients on the Time of Day variables very much support the hypothesis that games starting later tend to have lower conversion rates. For example, a game starting between 4:30pm and 7:30pm played between 'average' teams at the MCG would be expected to produce conversion rates of just over 52%. A later-starting game would be expected to produce a fractionally lower conversion rate.

Month, it transpires, is also strongly associated with variability in conversion rates, with games played in any of the months May to August expected to produce higher conversion rates than those played in April. A game between 'average' teams, at the MCG, starting before 4:30pm and taking place in any of those months would be expected to produce conversion rates of around 54%, which is almost 1% point higher than would be expected for the same game in April. The Month variable then does not seem to be proxying for poorer weather.

Relatively few games in the sample were played in March (150) so, for the most part, April games were the first few games of the season. As such, the higher rates of conversion in other months might simply reflect an overall improvement in the quality and conversion of scoring shot opportunities once teams have settled into the new season.

Lastly, it turns out that attendance levels have virtually no effect on team conversion rates.

SUMMARY

It's important to interpret all of these results in the context of the model's pseudo R-squared, which is, again, around 2.5%. That means the vast majority of the variability in teams' conversion rates is unexplained by anything in the model (and, I would contend, potentially unexplainable pre-game). Any conversion rate forecasts from the model will therefore have very large error bounds. That's the nature of a measure as lumpy and variable as Conversion Rate, which can move by tens of percentage points in a single game on the basis of a few behinds becoming goals or vice versa.

That said, we have detected some fairly clear "signals" and can reasonably claim that conversion rates are:

  • Positively associated with a team's Offensive rating
  • Negatively associated with a team's opponent's Defensive rating
  • Positively associated with a team's opponent's Offensive rating
  • Higher (compared to the MCG) at Docklands, and lower at Cazaly's Stadium and Carrara
  • Lower for games starting at 4:30pm or later compared to games starting before then
  • Higher (relative to April) for games played between May and August
  • Unrelated to attendance

Taken across a large enough sample of games, it's clear that these effects do become manifest, and that they are large enough, despite the vast sea of randomness they are diluted in, to produce detectable differences.

Next year I might see if they're large enough to improve MoSSBODS score projections because, ultimately, what matters most is if the associations we find prove to be predicitively useful.

Establishing Metrics for Margin, Total Score and Team Score Predictions

These days, I reckon I know what a good margin forecaster looks like. Any person or algorithm - and I'm still at the point where I think there's a meaningful distinction to be made there - who (that?) can consistently predict margins within 30 points of the actual result is unarguably competent. That benchmark is based on the empirical performances I've seen from others and measured for my own forecasting models across the last decade of analysing football. 

Read More