The Predictability of Men’s AFL Crowds Revisited
This week the topic of what characteristics of a game of men’s AFL football might be correlated with its attendance was discussed on Twitter, which caused me to review this blog from 2015 where I looked at the same topic.
Today I’m going to revisit that topic in much the same way as I did then, but extending the time frame to include the home and away seasons through 2019, and including a few more potentially predictive characteristics.
THE DATA
As just foreshadowed, we'll be using data for the period 2000 to 2019, focussing solely on games from the home-and-away portion of those seasons.
We'll source attendance figures from the R fitzRoy package and we'll include as regressors, variables to describe:
The Home Team
The Away Team
Whether or not the Home Team was the pre-game Favourite (as determined by MoSH2020 Team Ratings)
The Strength of Favouritism for the team preferred by MoSH2020 (measured in terms of the expected margin)
The Venue at which the game was played
The Time of Day at which the game commenced
The Day of the Week on which the game took place
The Month of the Year in which the game took place
The Year in which the game took place
With the exception of swapping MoSH2020 for MoSSBODS, these are the same variables as we used in the previous blog. For this new blog we’re going to add:
Whether or not the game was played on a “special” day (viz ANZAC Day, Mothers’ Day, or Easter Friday, Saturday, Sunday, or Monday)
Whether or not the game involved two teams from the same State
Whether or not the game was a Victorian Rivalry game (according to this list in Wikipedia)
Where the teams sat on the competition ladder going into the round
THE REGRESSORS
Before we look at the fitted model, let's explore the relationship between Attendance and each of the proposed regressors, in turn, starting with the Season in which the game took place.
In this first table we look at Average Attendance at games within each of the 20 home-and-away seasons and find that the all-season average attendance is just over 34,000 per game, with the most recent seasons returning the competition to attendance levels previously seen in 2011, but not quite to those of the 2005 to 2009 era. The phenomenon of average attendance figures growing for three consecutive years, which we see for the 2017 to 2019 period, had not occurred since 2003 to 2005.
The variability of crowds, as described by the standard deviation, peaked for the 2018 and 2019 seasons, eclipsing the values seen during the 2011 to 2013 period with the introduction of the Gold Coast and GWS teams. Whilst it’s obviously a good thing for the League that average attendance figures are rising, I imagine that the increased unpredictability must bring its own headaches.
For modelling purposes we’ll split the 2000 to 2019 history into four equal periods of five years, although a case could be made for different splits.
Next, let's look at average attendances for each of the 18 teams when they are the designated home team.
This table is sorted on the basis of average attendance and sees Collingwood perched atop the list having, on average, generated crowds when playing as the home team of a bit under 52,000.
They are one of seven teams that can claim to have generated above average home crowds, the six other teams including four more from Victoria (Essendon, Richmond, Carlton and Hawthorn), and just two non-Victorian teams (Adelaide and West Coast). Fremantle only narrowly misses the list.
Three non-Victorian teams fill the bottom three places (GWS, Gold Coast and the Brisbane Lions), the latter just nudging out the Kangaroos who've seen crowds of around 25,350 at their typical home game. GWS’ average home attendance figure is over 40,000 fans per game lower than the average for Collingwood, and less than half the figure of Brisbane Lions, who sit 16th on the list.
Hawthorn have seen the most variable home crowds, though Essendon, Carlton, Collingwood, Richmond and Melbourne have also been associated with above-average variability of home game crowd sizes. Amongst the teams drawing larger home crowds, Adelaide and West Coast stand out as producing the lowest levels of variability.
Looking at the teams when they are designated Away teams instead, we find that Collingwood, again, is associated with the largest average crowds (just over 50,000 per game), and that Essendon is once again second to them.
In fact, of the six teams with above-average drawing power as Away teams, five of them also have above-average drawing power as Home teams (Collingwood, Essendon, Carlton, Hawthorn and Richmond). Geelong is the only team that climbs into the Top 6 as an Away team that wasn't also towards the top of the list as a Home team.
Notably, all six of the teams are based in Victoria.
In contrast, the seven teams with the smallest average attendances when playing as the Away team are all non-Victorian.
We see, however, a much smaller range in the average attendance figures when looked at in this way, with the bottom-placed Gold Coast only about 27,000 fans per game behind the top-placed Collingwood.
The greatest variability in attendance as an Away team is, however, associated with those six Victorian teams at the top of the list, suggesting that their attendance figures are much more boom or bust than for other teams.
When we turn next to Venue we find a considerable range of average attendances, even for grounds that have been used quite a number of times during the period.
The MCG has attracted the largest average crowd of just under 48,000 per game. Perth Stadium has rapidly moved into 2nd place, only about 600 fans per game behind after 45 matches. The Adelaide Oval and Stadium Australia are the only other venues to have averaged over 40,000 fans per game.
Four more non-Victorian venues (Subiaco, Football Park, the Gabba, and the SCG) fill slots in the next five places, all of them with average attendances in roughly the 24,000 to 33,000 per game range.
At the foot of the table we find a variety of non-Victorian grounds with averages of around 15,000 fans per game or lower.
By far, variability in attendance figures has been greatest for the MCG, its standard deviation of almost 18,500 attendees over 50% larger than the figure of almost 12,000 for Stadium Australia. Many of the venues lower down the table show much smaller variability in attendance, partly because of their limited capacity.
Next, let's consider the crowd-boosting effects of home team favouritism (determined by MoSH2020 Ratings and not any bookmaker).
We find a small, though highly statistically significant, jump in attendance when the home team is the favourite, equivalent to a little over 700 people per game. It’s just a bit more fun going to support your team at home, I guess, when they’re more likely to win than lose.
One aspect of this table that's interesting to note is just how common it is for the home team to be favourite. It's the case just over 62% of the time, which is well above what we'd expect due to chance.
Next we look at the effects of month, day of week, and time of day.
Firstly, here's a table of average attendances by month.
It reveals that early-season games tend to attract the largest crowds, with games in March dragging an additional 4,200 fans through the gate, and those in April, which include ANZAC day clashes, attracting about an additional 1,700 fans.
The period May through August then sees very similar average attendance figures, all in roughly the 33,000 to 34,000 range.
Those seasons where there have been home and away games in September have produced crowds a bit over 2,600 below average, a reflection perhaps of the likely relative unimportance of those games in the context of the relevant season's Finals.
The greatest variability in crowds has been seen across games played in April, narrowly ahead of those played in March, while the lowest variability has been for games played in September, with games played in August and June showing only slightly more variability.
From a Day of Week perspective, days from other than the weekend have produced the largest average crowds, lifted by the "blockbuster" or “special” nature of these contests.
The largest average crowds have come on Tuesdays and Wednesdays, though the paucity of games on those days makes this average subject to extremely large sample variation.
Amongst those days on which more than 100 games have been played - Friday, Saturday and Sunday - Friday has the highest average at just under 44,000 fans per game, while Sunday has the lowest at just under 31,500 fans per game. Across these three days, Sunday crowds have been the least variable, and Saturday the most variable, though only slightly moreso than Friday crowds.
Games starting at 7:30pm or later have attracted significantly larger crowds (about 41,200 per game), while those starting before 4:30pm have attracted the smallest crowds (just under 33,000).
The greatest variability in attendance, however, has also been associated with games starting at 7:30pm or later, while the smallest variability has been associated with games starting in the 4:30pm to 7:30pm window.
Next we quantify the raw effect of the expected competitiveness of the contest on attendances, which here we'll measure using MoSH2020's expected victory margin for the favourite, measured in points.
What we find is a clear relationship between attendances and the strength of the favourite, with games expected to be close attracting an additional 1,700 or so fans above average, and with games expected to be blowouts seeing crowds, on average, over 6,000 below the all-game average.
Variability of attendance is highest in games where the favourite is expected to win by 12 to 23 points, and is lowest in games where the favourite is expected to win by 48 points or more.
While fans clearly, according to this analysis, prefer contests that are expected to be close, we might also hypothesise that they’d rather see games more likely to effect the composition of the finals. To this end we next look at attendance figures grouping games on the basis of the ladder positions of the home and the away team going into the game.
What we find is that attendance figures are highest when both teams are currently in the Top 8, such games enjoying average attendances over 6,000 fans per game higher than the all-game average. Conversely, attendance figures are lowest when neither team in currently in the Top 8. These games draw crowds about 3,300 fans smaller than the all-game average.
Highest attendance variability is associated with those games involved two teams from the Top 8, and lowest variability with games where the home team is outside the Top 8 and the away team is inside the Top 8.
Now there’s no doubt that parochialism and tradition plays a part in attendance figures, which is what we’ll investigate next and finally.
Firstly we’ll look at the attendance figures based on the home States of the competing teams.
We see that the Port Adelaide v Adelaide clash, on average, generates the largest crowd at just under 45,000 per game, about 1,500 per game higher than the average crowd for a game that sees two Victorian teams face off. These games involving two Victorian teams also produce crowds with the highest variability, however.
The lowest crowds come for the Brisbane Lions v Gold Coast games (just under 18,500 per game), though average crowds for games involving teams from different States are also well below the all-game average.
For our final univariate analysis we group games based on whether or not they are considered a “traditional rivalry game” between two Victorian teams, which is one of the following 12 matchups:
Any pairing of two from Carlton, Collingwood, Richmond, or Essendon
Any pairing of two from Hawthorn, North Melbourne, or Essendon
Hawthorn v Geelong
Collingwood v Geelong
Collingwood v Melbourne
Defined in this way, Victorian rivalry games have an average attendance of almost 59,000 per game, which is almost 25,000 per game above the all-game average. These games also have higher variability of attendance, however, than non Victorian-rivalry games.
THE MODEl
So, what happens if we construct an OLS regression where we attempt to fit actual attendance as a function of the variables just explored?
We end up with a model that explains over 83% of the variability in actual attendances across the past 20 home and away seasons, and that is:
within 620 of the actual attendance 10% of the time
within 1,620 of the actual attendance 25% of the time
within 3,600 of the actual attendance 50% of the time
within 6,300 of the actual attendance 75% of the time
within 9,800 of the actual attendance 90% of the time
The first set of coefficients from that model is summarised at right. Interpreting its coefficients requires an understanding of the model’s construction, in particular its "reference" levels, which mean that the coefficients in some of the blocks are estimates relative to a game played:
at the MCG
with Hawthorn at home
with West Coast away
on a Friday
starting before 4:30pm
sometime in the 2000 to 2004 period
in March
but not at Easter
where West Coast start as favourites
and where neither team was in the Top 8 going into the game
The coefficients give us the incremental effects of the characteristics we’re looking at, after controlling for all other variables in the model, and holding characteristics at their reference levels where necessary.
Looking at each of the coefficient blocks in turn we see that, from the first block, Carlton, Collingwood, Essendon, Geelong and Richmond might be expected to draw a larger crowd than Hawthorn when playing at home (which, Geelong aside, we saw in the earlier table).
The second block reveals that the Lions fans are likely to attend in greater number when the Lions are favourites than are fans of other clubs, and that the Roos' and Cats' fans are least likely.
Next, the third block tells us that a number of teams are better drawcards than others as Away teams, most notably Collingwood, Essendon, Carlton, Richmond, Sydney and Geelong. GWS, Fremantle, Melbourne, North Melbourne, Western Bulldogs and Port Adelaide are significantly poorer drawcards.
More coefficients are shown in the table left, the first block of which talks to Venue (relative to the MCG) and shows that only Adelaide Oval can be expected to draw a larger crowd for a given contest (after controlling for all other variables in the model). Most other venues are expected to draw crowds of between 15,000 to 25,000 less (although it's important to factor in the Home team coefficients from Blocks 1, and perhaps 2 and 3 above, when coming up with a likely crowd figure, because certain teams are more likely to be the Home team at particular venues).
The next two blocks estimate the joint effects of the day of the week on which a contest is played and the local time at which the game starts and shows that, for example, relative to a (notional) Friday game starting before 4:30pm, a Saturday game starting at 7:30pm or later would attract a crowd -2,006+1,488-590, or about 1,100 fans smaller.
As another example, a Sunday game starting between 4:30pm and 7:30pm would be expected to attract a crowd -1,939-759+171, or about 2,500 fans smaller.
We see from the coefficient for the MoSS2020 absolute expected margin that every additional point of superiority for the Favourite knocks just under 50 people off the crowd, so a 20-point favourite would, for example, be expected to reduce the attendance by about 1,000 people, or about 3% of the average crowd. Of course, if that favourite were also the home team then the coefficients from the earlier Block 2 need to be considered as well.
Finally here, amongst the last block of coefficients in this section we have the effects of the seasons (relative to the 2000 to 2004 period) and find that crowds in the 2005 to 2009 were almost 3,100 higher per game, while those in the period 2010 to 2014 were just under 2,700 higher per game, and those in the 2015 to 2019 period just under 2,600 higher per game. That's an interesting result when you look back at the raw season-by-season attendance figures, which have the 2010 to 2014 era with an average attendance only 400 fans per game higher than 2000 to 2004, and the 2015 to 2019 era only 550 fans per game higher. The coefficients in the model, however, adjust for the different mixes of games across the seasons, adjusting for example, for the generally attendance-depressing impact of the introduction of Gold Coast and GWS.
In this next block we look at the attendance effects related to holiday and “blockbuster” games played on ANZAC Day, Mothers’ Day or any of the days of the Easter period.
We find, for example, that:
ANZAC Day games at the MCG produce a crowd about 13,500 higher than would be expected on another day
A lesser (and, in some cases, net negative) effect is evident for ANZAC Day games played at other venues. Docklands and Manuka, for example, see smaller crowds on ANZAC Day than they’d expect to see for the same teams on a different day
Mothers’ Day games at the MCG produce a crowd about 12,000 lower than would be expected on another day
Elsewhere the reduction is smaller and, in the cases of Adelaide Oval, Subiaco, and York Park, net positive
Good Friday games at the MCG produce a crowd about 1,800 larger than would be expected on another day. At Docklands, the increase is close to 10,000
At most venues, Easter Saturday sees smaller crowds than what would be expected on another day
Easter Sunday brings smaller crowds at the MCG, but larger - or, at least, no different - crowd figures at other venues
Easter Monday games at the MCG produce a crowd almost 8,000 larger than would be expected on another day. At Docklands, the increase is closer to 5,000.
The final block of coefficients looks firstly at the States from which the teams hale and, in the case of a matchup involving two Victorian teams, whether or not they’re considered to be “traditional rivals”.
We find that games between traditional Victorian rivals when played at the MCG draw crowds over 18,000 (9,803 + 8,211) higher than would otherwise be expected based on the other coefficients in the model. Playing the same game at Docklands would see a smaller increase of around 8,500, and at Kardinia Park, Princes Park, or York Park, an increase of only around 3,500 to 4,000.
Port Adelaide v Adelaide clashes are expected to bring crowds about 13,000 higher, and between Fremantle and West Coast almost 9,000 higher. Brisbane Lions v Gold Coast clashes see virtually no increase or decrease, and GWS v Sydney clashes see an expected decrease of just over 2,000.
Looking next at when the games are played and the ladder status of each of the teams we find (calculated using the enormously handy return_ladder() function in fitzRoy), for example, that:
A clash between two teams in the Top 8, played in August, is expected to draw an incremental 3,500 fans (-752-7,816+11,979)
The same clash but where neither team is in the Top 8, is expected to draw almost 8,500 fewer fans (ie the coefficient for August)
Overall, the largest increases come for games played in July or August and involving two teams in the Top 8
The largest decreases come for games played in August or September and involving two teams that are outside the Top 8
Games where only the Away team is in the Top 8 are generally expected to draw crowds about 1,000 to 2,000 lower than games where only the Home team is in the Top 8. This figure leaps to nearer 4,000, however, if the game is in August.
The very last figure reveals that the model, in its entirety, accounts for just over 83% of the variability in attendance figures across the 3,706 games.
WHERE ARE THE ERRORS?
A plot of the Actual versus the Fitted attendance reveals a reasonably good fit to crowds of all sizes though maybe a slight tendency to underestimate the attendance at games which attract audiences above 50,000.
A further analysis shows that this tendency is not the case for games played on “special” days.
Looking at mean absolute errors for each of the regressors, we find that absolute errors are:
Largest in August (5,120) and smallest in April (4,148)
Largest in 2000 (5,128) and smallest in 2006 (3,819). For 2019, the mean absolute error was 4,937.
Largest for Wednesday (7,352) and smallest for Sunday (4,358)
Virtually identical for games starting before 4:30PM (4,445) or between 4:30PM and 7:30PM (4,432), and highest for games starting at 7:30PM or later (5,699)
Largest for Richmond home games (7,094) and smallest for GWS home games (2,373)
Largest for Essendon away games (6,022) and smallest for Adelaide away games (3,592)
Largest for Stadium Australia games (7,953) and smallest for Jiangwan Stadium games (1,183) amongst venues used more than once
Largest when the home team is favourite (4,675 vs 4,554)
Largest when there is less than a 12-point favourite (4,769) and smallest when there is a 48-point favourite (3,929)
Largest for games played on the Queen’s Birthday Holiday (4,950) and smallest for games played on Mothers’ Day (2,702)
Largest for games involving two Victorian teams (5,896) and smallest for games involving two West Australian teams (3,341)
Largest when the game is between traditional Victorian rivals (8,083 vs 4,236)
Largest for games involving two teams in the Top 8 (5,116) and smallest for games where only the away team is in the Top 8 (4,277)
WHAT NEXT?
It could be that the remaining 17% or so of variability in attendance figures is genuine unpredictable noise, but there remain a few other factors that might be predictive, including:
Weather (rain, temperature, wind, humidity, etc)
Whether or not the game was played as part of a round with byes
Whether or not there were other major sporting events on at the same time and/or in the same State
The availability or unavailability of key players
The average experience of the named squads for the two teams
There might also be some non-linear effects (for example, for the expected margin where we see the mean absolute error fall as the expected margin increases) and higher level interactions that are missed by an OLS regression.
All things to consider for a future post.