Matter of Stats

View Original

AFL Crowds and Optimal Uncertainty

Fans the world over, the literature shows, like a little uncertainty in their sports. AFL fans are no different, as I recounted in a 2012 blog entitled Do Fans Really Want Close Games? in which I described regressions showing that crowds were larger at games where the level of expected surprisal or 'entropy' was higher. 

The entropy measure I used in those regressions treated the source of any uncertainty symmetrically in that, for example, a game where the home team was a 70% favourite was assessed as having the same level of entropy - and hence the same level of interest to an intending attendee - as a game where the away team was the 70% favourite. This agnosticism towards uncertainty might or might not reflect the reality of fans' behaviour. Some recognition was given in those earlier regression model to the likely greater importance of home team favouritism by including a dummy variable reflecting whether or not the home team was the favourite, but that was the extent of any asymmetric treatment of the two teams.

More recently I've come across some articles in the literature suggesting that there is an optimal home team probability for some sports that serves to balance the Goldilocksian desires of home team fans to feel that a home team victory is more likely than not, but not so likely that it renders the contest no longer worth watching.

That would suggest a more nuanced asymmetric treatment of home team versus away team prospects is required and it's this aspect that I'm especially keen to visit later in this blog.

CROWDS AND SURPRISE BY TEAM

First though, let's update those 2012 tabulations of crowd, entropy and actual surprisal data, now adding the Home and Away season data for 2013 and 2014 to give us nine complete years of data.

(As in the earlier blog, crowd data has been sourced from the afltables site, to which I again here record my gratitude.)

On the left of the table is the summarised data for teams when playing at home, a status enjoyed by each team on 99 occasions across the nine years, except by the two most-recently joined teams. Note that, for the purposes of this blog, I've used the AFL's designation of 'home' and 'away' status, so, for example, a designated Tigers home game at the Gold Coast Stadium is counted as a Tigers home game.

Here we can see that Adelaide has averaged crowds of just over 39,000 when playing at home and has participated in contests where the average entropy has been 0.847 bits. That's roughly equivalent to a game where the favourite is a 72.6% proposition. Those contests have been slightly more surprising that the TAB Bookmaker expected since the actual average level of surprisals generated by the outcome of those games has been 0.894 bits.

Staying focussed on this block of data we can see that Collingwood are the team with the largest average home crowd at a little under 56,500 per game. At the other end of the scale, GWS have drawn the smallest home crowds at just under 10,000 per game. Amongst the more established teams, the Brisbane Lions have drawn the smallest home crowds (just over 25,000 per game), with the Roos (about 26,000), Port Adelaide (about 26,600), the Western Bulldogs (roughly 27,750), and Melbourne (about 29,700) the only other teams to average under 30,000 per game.

At home, Roos fans have faced the greatest uncertainty of outcome, their average contest promising 0.868 bits of information. Those games have ultimately delivered even more surprise than was expected (0.887 bits), but this is considerably less than the average surprise delivered by Essendon (1.011 bits) and Port Adelaide (0.974 bits) home games. GWS, at home, has promised least surprise (0.575 bits) of any team, but despite being even less surprising (0.544 bits) than expected, finishes just behind Geelong (0.543 bits) in actual surprisal generation. In other words, Geelong's home games have gone even more to script than have GWS'.

(For a refresher on surprisals and entropy, please refer to the 2012 blog.)

Turning next to the middle block in the table, which relates to teams' away statistics, we find the Pies once again claiming top spot in crowd attraction. Their average crowd of just over 53,000 per game is almost 7,000 fans per game higher than the next best team, Essendon, at just over 46,200. Surprisingly, Collingwood and Essendon are two of only eight teams that draw larger average crowds at home than away, and two of only four Victorian teams about which the same can be said, Carlton and Richmond being the other two. Put another way, more than half the teams in the competition have drawn larger average crowds over the past nine years away from home than at home. That fact is also true if we consider only the most recent season, 2014, alone.

Members of those large away crowds for Pies games have witnessed, on average, the most surprising outcomes of any team, each contest having generated 0.915 bits of information (and bear in mind that the toss of a fair coin generates only 1 bit of information). In contrast, the 22,000 fans who've turned up to an average GWS away contest have very rarely been surprised about the outcome. They've witnessed just 0.343 bits of information per outcome, which is about as surprising as picking a business day at random and finding that it isn't a Friday.

Combining the home and the away data tells us that the Pies have drawn about 54,750 fans to an average game across the nine years, which is about 8,000 more than Essendon and 10,000 more than Carlton. It also reveals that games involving the Crows have promised the highest levels of surprise (0.860 bits) and delivered the fourth-highest levels of actual surprise (0.889 bits). Essendon's results have represented the highest level of surprise (0.938 bits per game), which is about 9% more information per game than was expected. That figure of 1.09 is also the highest ratio for any team of actual to expected surprise, the lowest ratio belonging to GWS whose average result has delivered 18% less surprise than was apparently expected. (My entropy calculations do, of course, depend on the methodology employed for inferring probability from head-to-head prices - which for this blog has been the Overround-Equalising variant - and will be affected by the extent to which this methodology misattributes overround to the teams involved).

Despite its high nine-year average, Collingwood's crowd-drawing ability appears to be declining. After peaking in 2010 at just under 61,500 fans per game, its home ground attendances have fallen in every succeeding year, most precipitously in 2014 when it fell by more than 7,000 fans per game. Its appeal playing away also declined, the average attendance at its away games falling by over 5,000 fans per game.

Adelaide and Port Adelaide were teams whose home team attendances spiked most notably in 2014, rising by over 14,000 and 17,000 per game respectively as they swapped their home games from Football Park to the revamped Adelaide Oval. Those levels of increase did not extend to their away game attendances however, which in Adelaide's case fell by over 2,000, and in Port Adelaide's rose by just over 3,000 fans per game.

Carlton and Richmond were other teams whose away performances were notably less well attended in 2014, while Sydney enjoyed modest increases in attendances at home and at away contests.

CROWDS AND SURPRISE BY VENUE

If you delight in large crowds then the MCG should have been your venue of choice over the past nine years, each contest there pulling an average attendance of almost 50,000. Assuming instead that you were seeking uncertainty - and, in these uncertain times, who isn't? - then Football Park would've been your preferred venue, promising that you'd leave the ground with 0.858 more bits of information than you'd had when you'd entered.

That choice would have left you a little disappointed however, as you would have boarded your transportation home with only 0.748 bits of information, considerably less than those wiser folk who'd ventured to Aurora Stadium and packed away, on average, 1.011 bits to take home. Aurora Stadium, however, offered only 37 opportunities for information acquisition across the nine years so, very arguably, Docklands would have been a superior choice. There you'd have had 428 opportunities to grab 0.974 bits of information per game. In total, the 428 contests at that venue have provided fans with 417 bits of information, making it by far the greatest generator of football information of any venue in use at any time during the period. The next best information pump, the MCG, has generated only 286 bits and this from just 21 fewer games.

CROWDS AND SURPRISE BY DAY OF WEEK

No day of the week has been spared at least one game of football at some point during the nine seasons being reviewed here, though Tuesdays and Wednesdays have been used only twice each, three of these ANZAC day Collingwood v Essendon clashes, and the fourth an odd Hawthorn v Geelong game in 2011 on the day after ANZAC day to complete a curtailed Round 5 of just seven games that had started on the previous Thursday. AFL scheduling continues to mystify.

The average crowds on these two days have been very high (about 85,000 to 90,000), especially in comparison to the averages we've seen for Saturdays and Sundays (about 32,000 to 33,000). About 12% of all games have been played on Fridays, with these games drawing about 50% larger crowds than games on the weekend.

Across the core Friday to Sunday block, Fridays have both promised (0.859 bits) and delivered (0.838 bits) the most surprise, while Sundays have delivered least (0.775 bits) despite promising about the same level of surprise as Saturdays. If you're looking for an upset in a typical week of football then, Sundays would appear to be your worst bet and Fridays your best.

APPEARANCES OF EACH TEAM BY DAY OF WEEK

Lastly, before we analyse the results of the regression modelling, let's review the profile of each team's appearances, home and away, by day of week.

The Gold Coast and GWS have each played about 80% to 85% of their home games on a Saturday, more than any other team, while the Brisbane Lions and Gold Coast have each played 70% or more of their away games on Saturdays.

St Kilda have played almost one-quarter of their home games and almost 20% of their away games on Fridays. Collingwood have also featured prominently on Fridays, playing almost one-quarter of their home games and almost 30% of their away games on this day. Carlton, Essendon, Geelong and Hawthorn have been other teams frequently appearing in the early part of the weekend.

Combining home and away appearances, the Gold Coast have the largest proportion of Saturday games (78%) and the Western Bulldogs the smallest (38%), while Melbourne are highest for Saturdays (48%) and Collingwood lowest (20%). Collingwood, however, have the largest proportion of Friday games (26%), and GWS and Gold Coast the smallest (0%). None of the non-Victorian teams have played more than 10% of their games on a Friday. 

REGRESSION MODELS

As I mentioned at the start of the blog ,one of my goals was to investigate the notion of a crowd-optimising home team probability. Here I've done this by including Home Probability in a simple linear regression, both linearly and as a squared term, which is about the simplest way I can imagine of incorporating Home Probability in a way that affords the notion of an optimum.

Also included in the regression are the same terms as from the 2012 blog post, namely the identities of the home and the away teams, the game venue, and the day of the week on which the game was played. 

In the first of the regressions, the results of which appear on the left of the table at right, no terms are included that relate to individual seasons and the coefficients on Home Probability and its square are both statistically significant and imply that the optimum home probability is about 58%. 

We can derive this by noting that the sum of the Home Probability terms is maximised when the derivative of the sum is equal to zero, which implies that the Home Probability is equal to minus the coefficient on the linear term divided by twice the coefficient on the squared term. 

This figure of 58% is broadly consistent with the results I've seen for other sports (see, for example, this piece on NBA, which records an optimum home team probability of 67% and alludes to results of between 60% and 67% for MLB; this piece, which finds an optimum home team probability of 60.5% for the NRL; and this article, which summarises the findings of a large number of other papers and suggests that a range of optima have been found for some sports, and none at all for others, but that optima, where they exist, are most often in the 60% to 70% range).

For the regression whose results are recorded on the right of the table I've allowed for season-specific coefficients for the two terms involving home team probability, which serves to modestly lift the proportion of explained variance from 77.87% to 78.18% and which also affords the opportunity to calculate season-specific optimal home team probabilities.

These optima are shown in the table below and suggest that fans preferred stronger home team favourites in 2006 and 2007, but also that they have since generally rewarded less certain home team victors with their attendance in ensuing seasons.

The most recent season has hinted at a return towards a preference for more highly fancied home teams, though the optimum of 58.7% for 2014 remains well below the 64 and 65% optima of those earliest analysed years.

It seems then, that AFL crowds most prefer home teams that are modest favourites. The Goldilocks zone appears to span an approximate range from 55% to 65% (which equates to prices of about $1.50 to $1.75 assuming a 5% overround) within which, using the 2014 coefficients, estimated attendances are practically flat. Outside that zone, increases or decreases in home team probability have more material affects on expected attendance. For example, about another 660 fans could be expected to attend a game with a 59% home team favourite compared to the same game with a 3/1 on home team favourite. Compared to a 4/1 on home team favourite the additional expected attendance would be 1,130 fans.

Similar calculations for other home team probabilities are shown in the chart at right.

A comparison of the regression coefficients in this blog with those from the 2012 blog in which Entropy was used instead of Home Probability (and its square) shows broad similarity in the outputs. This is partly because of the high empirical correlation between Entropy and Home Probability x (1-Home Probability), which comes in at +0.998 across the nine years. It's also because the crowd-optimising Home Probability turns out to be quite close to 50%, which is the implicit optimum imposed by the earlier formulation using Entropy. To see why this is the case, recall that the 2012 regression had a positive coefficient on Entropy, so higher Entropy implied larger estimated attendance. Next, recall that Entropy is maximised when the Home Probability is 50%. QED.

CONCLUSION

The raw historical data reveals that per game attendances in 2014, though up marginally on the 2012 and 2013 figures, are nonetheless down by about 4,000 fans per game (or 13%) on the 2008 home and away season peak.

This decline has coincided with the introduction of the Gold Coast and GWS into the competition and with an overall increase in the variability of home team probability, as shown by the boxplot at left. The standard deviation of home team probability peaked in 2012 at 27.7% points, which is almost 30% higher than the standard deviation for 2008.

Increased variability has meant a decrease in the proportion of games within the "Goldilocks Zone" of home team probabilities, which I've defined as the set of probabilities within 15% of the optimal season-by-season home team probabilities as estimated using the second of the regression models above. It's really only outside this range, the modelling suggests, that the impact on estimated attendance is material - say 500 fans or more.

It'd be wrong though to attribute too much of the per game decline in attendances to the changing distribution of home team probabilities in recent seasons - the modelled effect of a change in home team probability from the 2014 optimum of 59% to, say, 30% or 88% is only about 2,000 fans, which is only one half of the decline from 2008.

Still, it'd be nice to move closer to the situation as it was in the 2007 to 2010 period when 40-50% of all games were in the Goldilocks zone rather than the situation as it stands now where the proportion is nearer 30%. Here's hoping that the Suns and the Giants continue to improve, that the Dees and the Saints become more competitive, and that no other team surprises us with its ineptitude in 2015.