June 21, 2017

Do Fans Want High-Scoring or Close Games, or Both?

June 21, 2017/ Tony Corke

We've looked at the topic of uncertainty of outcome and its effects on attendance at AFL games before, first in this piece from 2012 and then again in this piece from 2015.

In both of those write-ups, we used entropy, derived from the pre-game head-to-head probabilities, as our measure of the uncertainty in the outcome. In the first of them we found that fans prefer more uncertainty of outcome rather than less, and in the second that fans prefer the home team to be favourites, but not overwhelmingly so.

Today I want to revisit that topic, including home and away game attendance data from the period 2000 to the end of Round 13 of the 2017 season (sourced from the afltables site), and using as the uncertainty metric the Expected Margin - the Home team less the Away team score - according to the MoSHBODS Team Rating System. There's also been a suggestion recently that fans prefer higher-scoring games, so I'll also be including MoSHBODS' pre-game Expected Total data as well.

Let's begin by looking at the relationship between expected final margin (from the designated Home team's perspective) and attendance.

There are just over 3,000 games in this sample and the preliminary view from this analysis is that:

fans prefer games with expected margins near zero - maybe where the home team is a slight favourite
attendance drops off more rapidly for decreases in expected margin (ie as the home team becomes a bigger underdog) than for increases in expected margin (ie as the home team becomes a bigger favourite)

Those conclusions are broadly consistent with what we found in the earlier blogs (and with the more general "uncertainty of outcome" hypothesis by which name this topic goes in the academic literature).

There doesn't appear to be much evidence in this chart for increased attendance with higher expected total scoring, however, an assertion that the following chart supports.

Now there's clearly a lot of variability in attendance in those charts, and whilst Expected Margin might explain some of it, by no means does it explain all of it.

One obvious variable to investigate as a source for explaining more of the variability in attendance is Home team, since some teams are likely to attract higher or lower attendances when playing at home, regardless of how competitive they are.

We see here some quite different patterns of association between Expected Margin and Home team, with a number of teams - especially the non-Victorian ones - drawing similar crowds almost regardless of how competitive they were expected to be

Attendances, of course, are constrained by capacity at many venues, which suggests another dimension on which we might condition the analysis.

Here we consider only the 10 venues at which at least 50 home and away games have been played during the period we're analysing, and we again see a variety of relationships between attendance and expected margin, though the more frequently-used grounds - the MCG and Docklands - do show the inverted-U shape we saw in the first chart.

We could continue to do these partial analyses on single variables at a time, but if we're to come up with an estimate of the individual contribution of Expected Margin and Expected Total to attendance we'll need to build a statistical model.

For that purpose, today I'll be creating a Multivariate Adaptive Regression Spline model (using the earth package in R), which is particularly well-suited to fitting the type of non-linear relationship we're seeing between attendance and Expected Margin.

The target variable for the regression will be Attendance, and the regressors will be:

Designated Home Team
Designated Away Team
Venue
Day of Week
Month of Year
Night Game dummy (which is Yes if the game starts after 5pm local time)
Same State dummy (which is Yes if the game involves two teams from the same State playing in their home state)
Expected Margin (from MoSHBODS, home team perspective)
Expected Total Score (from MoSHBODS)

We'll allow the algorithm to explore interactions, but only between pairs of variables, and we'll stop the forward search when the R-squared increases by less than 0.001.

We obtain the model shown at right, the coefficients in which we interpret as follows:

A constant, which sets a baseline for the fitted attendance values. It will be added to and subtracted from on the basis of the other, relevant terms
A block of coefficients that apply based on the game's Designated Home Team. If that's Collingwood, for example, we add 6,805 to our fitted attendance figure.
A block of coefficients that apply based on the game's Designated Away Team. If that's Fremantle, for example, we subtract 3,944 from our fitted attendance figure.
A block of coefficients for different venues. Games played at the MCG attract a 5,747 increase, for example
A coefficient for night games, which attract, on average, an extra 2,657 fans
A coefficient for games played between teams from the same State and played in their home state (for example, Adelaide v Port Adelaide at the Adelaide Oval, Brisbane v Gold Coast at the Gabba, or Melbourne v Western Bulldogs at the MCG). These games attract, on average, an additional 10,629 fans
A coefficient for a hinge function based on the Expected Margin. That coefficient only applies when the expression in the brackets, 3.04112 - Expected Margin, is positive. Here that means it only applies where the Expected Margin is less than 3.04112 points. For Expected Margins greater than this, the effect is zero, for Expected Margins below, it reduces the fitted attendance by 118.5 people per point of Expected Margin.
A coefficient for another hinge function based on the Expected Margin, but this one only applies for games that are played between teams from the same State in their home State. The hinge element adds a further restriction and means the coefficient applies only in games where the Expected Margin is above 3.0727

Together, these last two terms create the relationship between attendance and Expected Margin that we saw earlier. The orange portion to the left of about a +3 Expected Margin applies to all games. For games where the Expected Margins is above about +3 points, the red portion applies if the game involves teams from different States or teams from the same State playing out of their home State (for example, in Wellington or at Marrara Oval), and the orange portion applies if the game involves teams from the same State playing in their home State (for example Sydney v GWS at the SCG).

Note that we obtain here not only the inverted-U shape, but also a relationship where attendance drops off more rapidly with negative Expected Margins than it does with positive Expected Margins.

There are a few more interaction terms in the model.

One that involves a hinge function on Expected Total Score that applies in games involving teams from the same State. It provides a small increment to fitted attendances for games where the Expected Score exceeds about 165 points. So, for these particular game types (which represent about 37% of all games), fans do prefer higher scoring
Some terms related to particular Home Team and Venue combinations, and to particular Away Team and Venue combinations
Terms for the traditional Collingwood v Carlton, and Collingwood v Essendon matchups, where Collingwood is the Designated Home Team
Some special terms for games involving particular teams at home or away facing other teams from their home State.
Terms for Monday, Tuesday and Sunday games at the MCG, the first two capturing the higher attendances at long-weekend and ANZAC day matches
A night game at Stadium Australia term, and a Thursday night with teams from the same State term
Some Home team and Expected Margin interaction terms involving hinge functions again
A term for games where Essendon is the Away team that lifts the fitted attendance for games where the Expected Total is above about 171 points.

The overall fit of the model is quite good, with almost 80% of the variability in attendance figures being explained (the Generalised R-squared for the model, which provides an estimate of how well the model might be expected to fit other data drawn from a similar sample, is about 76%).

Diagnostic plots reveal that there is some heteroscedasticity, however, with larger errors for games with higher fitted attendance levels.

It could be that some systematic sources of error remains and that the fit could be improved by, for example, considering the criticality of a particular game in the context of the season or the availability or unavailability of key players. Weather too would doubtless play a role, and maybe even the quality of the other games in the round.

Nonetheless, this model seems a reasonable one for at least first-order estimations of the magnitudes and shapes of the relationships between attendance and Expected Margin, and between attendance and Expected Total score. Both Expected Margin and Expected Total have some influence, but the rate at which attendance varies with changes in depends on the specifics of the game being considered - in particular, who is playing whom, and where.

June 17, 2017

Does Offence or Defence Win Games : An Historical Perspective

June 17, 2017/ Tony Corke

(This piece originally appeared in The Guardian newspaper as https://www.theguardian.com/sport/datablog/2017/jun/15/matter-of-stats-afl-datablog-offence-defence)

There are a lot of sportswriters and sports fans who are adamant that it’s impossible to compare sporting teams across seasons and eras. Taken literally, that’s a truism, because every sport evolves, and what worked for a great team in, say the 1980s, might not work – or even be within the rules of the game – today.

Still, if asked to identify some of the best AFL teams of recent years, most would almost certainly include the 2000 Essendon, 2011 Geelong, and 2012 Hawthorn teams. We all have, if nothing else, some intuitive sense of relative team abilities across time.

Rating Teams

As imperfect as it is, one way of quantifying a team’s relative ability is to apply mathematics to the results it achieves, adjusting for the quality of the teams they faced. Adjustment for opponent quality is important because, were we to use just raw results, a 90-point thrashing of a struggling team would be treated no differently in our assessment of a team’s ability than a similar result against a talented opponent.

This notion of continuously rating individuals or teams using results adjusted for opposition quality has a long history and one version of the method can be traced back to Arpad Elo, who came up with it as a way of rating chess players as they defeated, drew or lost to other players of sometimes widely differing abilities. It’s still used for that purpose today.

In sports like football, Elo-style rating systems can be expanded to provide not just a single rating for a team, but a separate rating for its offensive and defensive abilities, the former based on the team’s record of scoring points relative to the quality of the defences it has faced, and the latter on its record of preventing points being scored relative to the quality of the offences it has faced.

If we do this for the AFL we can quantify the offensive and defensive abilities of teams within and across seasons using a common currency: points.

There are many ways to do this, and a number of websites offer their own versions, but the methodology we’ll use here has the following key characteristics:

Teams start with offensive and defensive ratings of zero and this is therefore the rating of an average team in every era
Ratings are adjusted each week on the basis of offensive and defensive performance relative to expectation
Final scores are adjusted to account for the fact that on-the-day accuracy – the proportion of scoring shots registered as goals – can be shown empirically to have a large random component
Every team has a “venue performance value” for every ground, which measures how much better or worse, on average, that team has historically performed at that venue. These values tend to be positive for teams’ home grounds and negative for their away grounds
Teams carry across a proportion of their end-of-season rating into the next season, which reflects the general stability in lists and playing styles from one season to the next
Adjustments made to team ratings on the basis of results relative to expectations tend to be larger in the early parts of the season to allow the ratings to more rapidly adjust to any changes in personnel, style or ability
The numerical value of the ratings has a direct interpretation. For example, a team with a +10 offensive rating would be expected to score 10 more points than an average team when facing an average opponent at a neutral venue. Similarly, a team with a -10 defensive rating would be expected to concede 10 more points than an average team when facing an average opponent at a neutral venue.
To obtain a combined rating for a team we can simply add its offensive and defensive rating

(For more details see this blog post on the MoSHBODS Team Rating System)

End of Home and Away Season Ratings in the Modern Era

Applying this methodology generates the data in the chart below, which records the offensive and defensive ratings of every team from seasons 2000 to 2016 as at the end of their respective home and away season. Teams that ultimately won the Flag are signified by dots coloured red, and those that finished as Runner Up as dots coloured orange. The grey dots are the other teams from each season – those that missed the Grand Final.

We see that teams lie mostly in the bottom-left and top-right quadrants, which tells us that teams from the modern era that have been above-average offensively have also tended to be above-average defensively, and conversely that below-average offensive teams have tended to be below-average defensively as well.

The level of association between teams’ offensive and defensive ratings can be measured using something called a correlation coefficient, which takes on values between -1 and +1. Negative values imply a negative association – say if strong offensive teams tended to be weak defensively and vice versa – while positive values imply a positive association, such as we see in the chart.

The correlation coefficients for team ratings in the current and previous eras appears in the table at right. We see that the degree of association between team offensive and defensive ratings has been at historically high levels in the modern era. In fact, it’s not been as high as this since the earliest days of the VFL.

In other words, teams offensive and defensive ratings have tended to be more similar than they have been different in the modern era.

By way of comparison, here’s the picture for the 1980 to 1999 era in which the weaker relationship between teams’ offensive and defensive ratings is apparent.

Note that the increase in correlation between teams’ offensive and defensive abilities in the modern era has not come with much of a reduction in the spread of team abilities. If we ignore the teams that are in the lowest and highest 5% on offensive and defensive abilities, the range of offensive ratings in the modern era span about 31 points and defensive ratings span about 34 points. For the 1980-1999 era the equivalent ranges are both about 2 points larger.

One plausible hypothesis for the cause of the closer association between the offensive and defensive abilities of modern teams would be that coaching and training methods have improved and served to reduce the level of independent variability in the two skill sets.

The charts for both eras have one thing in common, however: the congregation of Grand Finalists – the orange and red dots – in the north-eastern corner. This is as we might expect because this is the quadrant for teams that are above-average both offensively and defensively.

Only a handful of Grand Finalists in either era have finished their home and away season with below-average offensive or defensive ratings. And, in the modern era, just two teams have gone into the Finals with below-average defensive ratings - Melbourne 2000 and Port Adelaide 2007, both of which finished as runners up in their respective seasons.

Melbourne finished its home and away season conceding 100 points or more in 4 of its last 7 games, and conceding 98 and 99 points in two others. Those results took a collective toll on its defensive rating.

Port Adelaide ended their 2007 home and away season more positively but probably not as well as a team second on the ladder might have been expected to – an assessment that seems all the more reasonable given the Grand Final result just a few weeks later. In that 2007 Grand Final, Geelong defeated them by 119 points.

The chart for the modern era also highlights a few highly-rated teams that could consider themselves unlucky to have not made the Grand Final in their years – the Adelaide 2016 and St Kilda 2005 teams in particular, though that Saints’ rating was somewhat elevated by its 139-point thrashing of the Lions in the final home and away game of that season.

Based on the relatively small sample of successful teams shown in this chart, it’s difficult to come to any firm conclusions about the relative importance of offensive versus defensive ability for making Grand Finals and winning Flags, and impossible to say anything at all about their relative importance in getting a team to the finals in the first place.

To look at that issue we use the ratings in a slightly different way. Specifically, we use them to calculate the winning rates of teams classified on the basis of their offensive and defensive superiority or inferiority at the time of their clash.

Those calculations are summarised in the table below, which also groups games into eras to iron out season to season fluctuations and make underlying differences more apparent.

The percentages that are most interesting are those in the left-most column in each block.

They tell how successful teams have been that have found themselves stronger defensively but weaker offensively than their opponents.

What we find is that, in every era since WWII:

in home and away games, teams that were superior defensively and weaker offensively have won slightly more than 50% of their games
in finals, in contrast, teams that were superior offensively and weaker defensively have won slightly more than 50% of their games

We should note though that none of the percentages are statistically significantly different from 50%, so we can’t definitively claim that, in any particular era, defensive superiority has been preferable to offensive superiority in the home and away season or that the opposite has been true in finals. That’s the clear tendency, but the evidence is statistically weak, so the differences we see might be no more than random noise.

In any case, the effect sizes we see are quite small – around 1 to 2% points – so practically it makes more sense to conclude that offensive and defensive abilities have been historically of roughly equal importance to a team’s success in home and away games and in finals.

The Teams of 2017

So, where do the current crop of teams sit?

The chart below maps each of the 18 current teams’ ratings as at the end of Round 12 and the ratings of all 34 Grand Finalists from the period 2000-2016 as at the end of Round 12 in their respective years.

Adelaide stand alone offensively, with a rating almost as good as the 2000 Essendon team who were 12 and 0 after Round 12 having averaged just over 136 points per game in a year where the all-team average score was 103 points per team per game across the entire home and away season. The Dons scored then at a rate just over 30% higher than an average team.

This year, Adelaide are averaging just under 119 points per game in a season where the all-team average is just under 91 points per game, which is also about 30% higher. They are, clearly, a formidable team offensively, though they’ve yet to impress consistently defensively.

The 2017 Port Adelaide and GWS teams come next, both located just outside the crop of highest-rated Grand Finalists, and having combined ratings a little below Adelaide’s. This week’s loss to Essendon had a (quite reasonably) significant effect on Port Adelaide’s rating, as did GWS’ loss to Carlton.

Geelong, Collingwood, Sydney, Richmond and the Western Bulldogs are a little more south-east of that prime Flag-winner territory, and would require a few above-expectation performances in upcoming weeks to enter that area. The Bulldogs in particular would need to show a little more offensive ability to push into the group, though they had a similar rating at the same point last season, so who’s to say they need to do anything much more.

Collingwood’s relatively high rating might raise a few eyebrows, but they have, it should be noted, generated more scoring shots in their losses to the Western Bulldogs in Round 1 and Essendon in Round 5, and generated only four or fewer less scoring shots in their losses to Richmond in Round 2, St Kilda in Round 4, Carlton in Round 7, GWS in Round 8, and Melbourne in Round 12. They’re currently ranked 7th on combined rating.

Essendon, Melbourne and St Kilda form the next sub-group –rated slightly above average on combined rating but below almost all previous Grand Finalists at the equivalent point in the season.

No other team has a combined rating that is positive or that exceeds that of any Flag winner at this point in the season since 2000. As such, the remaining seven teams would make history were they to win the Flag.

Still, there’s a lot that can happen between now and the end of the season, as we can see in this final chart, which shows 2017 team ratings and the ratings of all non-Grand Finalists from the seasons from 2000 to 2016.

There are plenty of sides in the chart that were rated very highly at the end of Round 12 that never got as far as Grand Final day.

For example, the Geelong 2010 team was 10 and 2 after 12 rounds, one game clear at the head of the competition ladder with a 156 percentage. That team went 7 and 3 over the remainder of the home and away season to finish runners up in the minor premiership before being eliminated by the minor premiers, Collingwood, 120-79 in a Preliminary Final.

And, in any case, in a year where results have constantly surprised and where two wins currently separates 5th from 17th on the ladder, no team can reasonably feel assured of progressing into September, let alone to the MCG on the 30th.

June 09, 2017

Tipping Accuracy vs MAE as a Footy Forecaster Metric

June 09, 2017/ Tony Corke

I've spoken to quite a few fellow-modellers about the process of creating and optimising models for forecasting the results of AFL games. Often, the topic of what performance metric to optimise arises.

May 09, 2017

How Surprising Were the Round 7 Results?

May 09, 2017/ Tony Corke

We've used the surprisal metric a number of times here on MoS as a measure of how surprised we're entitled to feel about a particular head-to-head result.

April 23, 2017

What Can An In-Running Model Reveal About Close Games?

April 23, 2017/ Tony Corke

There are, clearly, a lot of people who are firmly convinced that some teams win a greater or lesser share of close games than they "should".

March 21, 2017

Injecting Variability into Season Projections: How Much is Too Much?

March 21, 2017/ Tony Corke

I've been projecting final ladders during AFL seasons for at least five years now, where I take the current ladder and project the remainder of the season thousands of times to make inferences about which teams might finish where (here, for example, is a projection from last year). During that time, more than once I've wondered about whether the projections have incorporated sufficient variability - whether the results have been overly-optimistic for strong teams and unduly pessimistic for weak teams.

March 04, 2017

An In-Running Model for the Total Score of an AFL Game

March 04, 2017/ Tony Corke

A few weeks ago, I wrote a piece describing the construction of an in-running model for the final margin of an AFL game. Today, I'm going to use the same data set (viz, score progression data from the www.afltables.com website, covering every score in every AFL game from 2008 to 2016) to construct a different in-running model, this one to project the final total score.

February 21, 2017

Selected AFL Twitter Networks: Graph Theory and Footy

February 21, 2017/ Tony Corke

Only a few times in my professional career as a data scientist have I had the opportunity to use mathematical graph theory, but the technique has long fascinated me.

Briefly, the theory involves "nodes", which are entities like books, teams or streets, and "vertices", which signify relationships between the nodes - such as, in the books example, having the same author. Vertices can denote present/absent relationships such as friendship, or they can denote cardinality such as the number of times a pair of teams have played. Where the relationships between nodes is between them and not from one to the other (eg friendships), the vertices are said to be undirected; where they flow from one node to another they're said to be directed (eg Team A defeated Team B).

February 09, 2017

Estimating Forecaster Calibration of Pre-Game Probability Estimates

February 09, 2017/ Tony Corke

In the previous post we looked at the calibration of two in-running probability models across the entire span of the contest. For one of those models I used a bookmaker's pre-game head-to-head prices to establish credible pre-game assessments of teams' chances.

February 09, 2017

In-Running Models: Their Uses, Construction and Efficacy for AFL

February 09, 2017/ Tony Corke

This week, there's been a lot of Twitter-talk about the use of in-running probability models, inspired in part no doubt by the Patriots' come-from-behind victory in the Superbowl after some models had estimated their in-running probability as atom-close to zero.

January 21, 2017

The Case of the Missing Margins (Are 12 to 24-point Margins Too Rare?)

January 21, 2017/ Tony Corke

The analysis used in this blog was originally created as part of a Twitter conversation about the ability of good teams to "win the close ones" (a topic we have investigated before here on MoS - for example in this post and in this one). As a first step in investigating that question, I thought it would be useful to create a cross-tab of historical V/AFL results based on the final margin in each game and the level of pre-game favouritism.

January 06, 2017

Team Rating Revisited: A Rival for MoSSBODS

January 06, 2017/ Tony Corke

Last year, predictions based on the MoSSBODS Team Rating System proved themselves to be, in Aussie parlance, "fairly useful". MoSSBODS correctly predicted 73% of the winning teams, recorded a mean absolute error (MAE) of 30.2 points per game, its opinions guiding the Combined Portfolio to a modest profit for the year. If it had a major weakness, it was in its head-to-head probability assessments, which, whilst well-calibrated in the early part of the season, were at best unhelpful from about Round 5 onwards.

November 04, 2016

Strength of Schedule Assessments: A Quick Numerical Comparison

November 04, 2016/ Tony Corke

With FMI today posting its assessment of the 2017 AFL draw, we now have (at least) the following comparable analyses:

October 27, 2016

The 2017 AFL Draw: Difficulty and Distortion Dissected

October 27, 2016/ Tony Corke

I've seen it written that the best blog posts are self-contained. But as this is the third year in a row where I've used essentially the same methodology for analysing the AFL draw for the upcoming season, I'm not going to repeat the methodological details here. Instead, I'll politely refer you to this post from last year, and, probably more relevantly, this one from the year before if you're curious about that kind of thing. Call me lazy - but at least this year you're getting the blog post in October rather than in November or December.

September 29, 2016

Classifying Grand Finals (A Reprise)

September 29, 2016/ Tony Corke

(This piece originally appeared in the Guardian, and revisits the topic of defining a typology for Grand Finals, which I first looked at in 2009 where I came up with a similar solution, and again in 2014 where I used a fuzzy clustering approach.)

For fans, even casual ones, AFL Grand Finals are special, and each etches its own unique, defining legacy on the collective football memory.

September 08, 2016

What Makes Finals Different from Games in the Home and Away Season?

September 08, 2016/ Tony Corke

This week we’ll be entering what promises to be one of the most closely-contested Finals series in recent years. If you believe the bookmakers’ assessments, each of the top four teams have at least a 15% chance of snaring the Flag, and Adelaide and West Coast both have chances of around half that.

September 02, 2016

The Finalists of 2016: A Recent Historical Perspective

September 02, 2016/ Tony Corke

With a week to go before the Finals commence, what better way to spend some of that time than reviewing this year's crop of finalists and comparing them to those of recent years?

July 16, 2016

What Proportion of Close Games Should the Better Team Win?

July 16, 2016/ Tony Corke

These week there's been a lot of talk about Hawthorn and their ability to "win the close ones", one narrative being that they are somehow able to do this more often than they "should" given that they're 5 and 0 in games finishing with a margin of under a goal this season.

July 15, 2016

Team Ratings and Conversion Rates

July 15, 2016/ Tony Corke

A number of blog posts here in the Statistical Analysis portion of the MoS website have reviewed the rates at which teams have converted Scoring Shots into goals - a metric I refer to as the "Conversion Rate".

In this post from 2014 for example, which is probably the post most similar in intent to today's, I used Beta regression to model team conversion rates:

as a function of venue, and the participating teams' pre-game bookmaker odds, venue experience, MARS Ratings, and recent conversion performance.
as a function of which teams were playing

Both models explained about 2.5 - 3% of the variability in team conversion rates, but the general absence of statistically significant coefficients in the first model meant that only tentative conclusions could be drawn from it. And, whilst some teams had statistically significant coefficients in the second model, its ongoing usefulness was dependent on an assumption that these team-by-team effects would persist across a reasonable portion of the future. We know, however, that teams go through phases of above- and below-average conversion rates, so that assumption seems dubious.

Other analyses have revealed that stronger teams generally convert at higher rates when playing weaker teams, so it's curious that the first model in that 2014 post did not have statistically significant coefficients on the MARS Ratings variable.

Maybe MoSSBODS, which provides separate offensive and defensive ratings, might help.

THE MODEL

For today's analysis we will again be employing a Beta regression (though this time with a logit link and not fitting phi as a function of the covariates), applying it to all games from the period from Round 1 of 2000 to Round 16 of 2016.

We'll use as regressors:

A team's pre-game Offensive and Defensive MoSSBODS Ratings
Their opponent's pre-game Offensive and Defensive MoSSBODS Ratings
The game venue
The (local) time of day when the game started
The month in which the game was played
The attendance at the game

(Note that the attendance and time-of-day data has been sourced from the extraordinary www.afltables.com site.)

Now, in recent conversations I've been having on Twitter and elsewhere people have been positing that:

better teams will, on average, create better scoring shot opportunities and so will convert at higher rates than weaker teams. In particular, teams with stronger attacks playing teams with weaker defences should show heightened rates of conversion.
dew and/or wet weather will generally depress scoring, partly because it will be harder to create better scoring opportunities in the first place, and also because any opportunity will be harder to convert than it would be from the same part of the ground were the weather more conducive to long and accurate kicking.

What's appealing about using including MoSSBODS ratings as regressors is that they allow us to explicitly consider the first argument above. If that contention is true. we'd expect to see a positive and significant coefficient on a team's own Offensive rating and a negative and significant coefficient on a team's opponent's Defensive rating.

On the second argument, whilst I don't have direct weather data for every game and so cannot reflect the presence or absence of rain, I can proxy for the likelihood of dew in the regression by including the variables related to the time of day that the game started and the month in which it was played.

Looking at the remaining regresors, venue is included based on an earlier analyses that suggested conversion rates varied significantly around the all-ground average for some venues, and attendance is included to test the hypothesis that teams may respond positively or negatively in their conversion behaviour in the presence of larger- or smaller-than-average crowds.

THE RESULTS

Details of the fitted mode appear below.

The logit formulation makes coefficient interpretation slightly tricky. We need firstly to recognise that estimates are relative to a notional "reference game", which for the model as formulated is a game played at the MCG, starting before 4:30pm and played in April.

The intercept coefficient of the model tells us that such a game, played between two teams with MoSSBODS Offensive and Defensive ratings of 0 (ie 'average' teams) would be expected to produce Conversion rates of 53.1% for both teams. We calculate that as 1/(1+exp(-0.126)).

(Strictly, we should include some value for Attendance in this calculation, but the coefficient is so small that it makes no practical difference in our estimate whether we do or don't.)

Next, let's consider the four coefficients reflecting MoSSBODS ratings variables. We find, as hypothesised, that the coefficient for a team's own Offensive rating is positive and significant, and that for their opponent's Defensive rating is negative and significant.

Their size means that, for example, a team with a +1 Scoring Shot (SS) Offensive rating and a 0 SS Defensive rating playing a team with a 0 SS Defensive and Offensive rating would be expected to convert at 53.3%, which is just 0.2% higher than the rate in the 'reference game'. This is calculated as 1/1(1+exp(0.126+0.008)).

Strong Offensive teams will have ratings of +5 SS or even higher, in which case the estimated conversion rate would rise to just over 65%.

Similarly, a team facing an opponent with a +1 Scoring Shot (SS) Defensive rating and a 0 SS Offensive rating, itself having 0 SS Defensive and Offensive ratings would be expected to convert at 52.8%, which is about 0.3% higher than the rate for the 'reference game'.

The positive and statistically significant coefficient on a team's opponent's Offensive rating is a curious result. It suggests that teams convert at a higher rate themselves when facing an opposition with a stronger Offence.as compared to one with a weaker Offence. That opponent would, of course, be expected to convert at a higher-than-average rate itself, all other things being equal, so perhaps it's the case that teams themselves strive to create better scoring shot opportunities when faced with an Offensively more capable team, looking to convert less promising near-goal opportunities into better ones before taking a shot at goal.

In any case, the coefficient is only 0.004, about half the size of the coefficient on a team's own Offensive rating, and about one-third the size of that on the team's opponent's Defensive Rating, so the magnitude of the effect is relatively small.

To the venue-based variables then, where we see that three grounds have statistically significant coefficients. In absolute terms, Cazaly's Stadium's is largest, and negative, and we would expect a game played there between two 'average' teams, starting before 4:30pm in April to result in conversion rates of around 46%.

Docklands has the largest positive coefficient and there we would expect a game played between the same two teams at the same time to yield conversion rates of around 56%.

The coefficients on the Time of Day variables very much support the hypothesis that games starting later tend to have lower conversion rates. For example, a game starting between 4:30pm and 7:30pm played between 'average' teams at the MCG would be expected to produce conversion rates of just over 52%. A later-starting game would be expected to produce a fractionally lower conversion rate.

Month, it transpires, is also strongly associated with variability in conversion rates, with games played in any of the months May to August expected to produce higher conversion rates than those played in April. A game between 'average' teams, at the MCG, starting before 4:30pm and taking place in any of those months would be expected to produce conversion rates of around 54%, which is almost 1% point higher than would be expected for the same game in April. The Month variable then does not seem to be proxying for poorer weather.

Relatively few games in the sample were played in March (150) so, for the most part, April games were the first few games of the season. As such, the higher rates of conversion in other months might simply reflect an overall improvement in the quality and conversion of scoring shot opportunities once teams have settled into the new season.

Lastly, it turns out that attendance levels have virtually no effect on team conversion rates.

SUMMARY

It's important to interpret all of these results in the context of the model's pseudo R-squared, which is, again, around 2.5%. That means the vast majority of the variability in teams' conversion rates is unexplained by anything in the model (and, I would contend, potentially unexplainable pre-game). Any conversion rate forecasts from the model will therefore have very large error bounds. That's the nature of a measure as lumpy and variable as Conversion Rate, which can move by tens of percentage points in a single game on the basis of a few behinds becoming goals or vice versa.

That said, we have detected some fairly clear "signals" and can reasonably claim that conversion rates are:

Positively associated with a team's Offensive rating
Negatively associated with a team's opponent's Defensive rating
Positively associated with a team's opponent's Offensive rating
Higher (compared to the MCG) at Docklands, and lower at Cazaly's Stadium and Carrara
Lower for games starting at 4:30pm or later compared to games starting before then
Higher (relative to April) for games played between May and August
Unrelated to attendance

Taken across a large enough sample of games, it's clear that these effects do become manifest, and that they are large enough, despite the vast sea of randomness they are diluted in, to produce detectable differences.

Next year I might see if they're large enough to improve MoSSBODS score projections because, ultimately, what matters most is if the associations we find prove to be predicitively useful.

July 07, 2016

Improving MoSSBODS' Team and Total Predictions

July 07, 2016/ Tony Corke

It's rare - I think unprecedented - for me to make changes to any of the Fund algorithms during the course of a season

Statistical Analyses