A Different Way of Clustering Men's AFL Games Based on the Margin Trajectory
Now it might be that you’re indifferent about whether it was the home or the away team that was leading at any point, but instead care about the size of the margin from the point of view of the team that eventually won.
In today’s blog we’re going to revisit the analysis of the previous blog, using the same data, but looking at it from the viewpoint of the winning teams.
(Thanks to Daniel from InsightLane for this idea. You can find all of the score progressions used here on the ScoreWorm page of his website.)
METHODOLOGY
Our methodology is as per that from the previous analysis except that we use “q=7” rather than “q=5” in the repr_matrix call, simply because it seems to give more homogenous and fewer clusters at the optimum. (If you’ve ever done any clustering analysis, you’ll appreciate just how subjective some of the decisions about preferable solutions really are.)
Again, our 1946 x 20 dataframe is reduced to a 1946 x 4 matrix representation of it.
When we cluster the columns of that matrix and chart the Davies Bouldin metric, we find the optimum at 20 clusters, although it isn’t substantially higher than the value for 11 clusters. Again somewhat subjectively, the 20 cluster solution seems to be better than the 11 cluster solution.
If we chart the time series for each of the 1,946 games based on the cluster to which that series belongs, we obtain the chart below where, again, the red lines track the median margin for games within a cluster, and the blue lines track the time series for the game that was closest to the median for its cluster (in terms of average deviation from the median summed across the whole game).
We see that most cluster types contain, as we’d hope, games with similar margin trajectories for the winning teams.
Eleven of the 20 clusters contain 99 or more games, and only Cluster 19, with 20 games, contains fewer than 50 games.
THE ARCHETYPAL GAMES
In the previous chart, 20 games are identified as being the most archetypal, one for each cluster. Four of them (those for clusters 2, 6, 10, 19) were also archetypes in the previous post. The score progression for those games is shown in blue, and those games are:
Cluster 1: 21/8/10 - Hawthorn v Fremantle (116)
Cluster 2: 27/8/16 - Geelong v Melbourne (111)
Cluster 3: 30/8/15 - St Kilda v Sydney (97)
Cluster 4: 5/8/18 - West Coast v Fremantle (58)
Cluster 5: 2/4/16 - St Kilda v Western Bulldogs (57)
Cluster 6: 2/6/19 - Essendon v Carlton (41)
Cluster 7: 4/5/13 - Gold Coast v Fremantle (45)
Cluster 8: 18/6/11 - Hawthorn v Gold Coast (71)
Cluster 9: 29/6/19 - Collingwood v Kangaroos (44)
Cluster 10: 17/4/10 - Kangaroos v Sydney (40)
Cluster 11: 12/6/15 - Port Adelaide v Geelong (23)
Cluster 12: 5/5/19 - Carlton v Kangaroos (58)
Cluster 13: 16/7/16 - Western Bulldogs v Gold Coast (48)
Cluster 14: 11/8/19 - Richmond v Carlton (28)
Cluster 15: 13/5/12 - Kangaroos v Western Bulldogs (18)
Cluster 16: 29/4/11 - Sydney v Carlton (16)
Cluster 17: 18/7/15 - Geelong v Western Bulldogs (8)
Cluster 18: 15/8/14 - Carlton v Geelong (6)
Cluster 19: 29/5/10 - Brisbane Lions v Collingwood (8)
Cluster 20: 9/4/17 - Carlton v Essendon (15)
The value shown in brackets is the final victory margin, and each game is linked to its matching AFLTables entry.
MIX OF HOME VERSUS AWAY WINNERS
In the previous analysis, each cluster was comprised almost exclusively of home wins or away wins, but we’ve no guarantee of that here.
If we calculate the proportion of home wins (or draws) within each of our new clusters, we do find some variability, but from a maximum of just over 70% for Cluster 2 to a minimum of just over 45% for Cluster 16.
Eight of the clusters have home team win percentages within a couple of percentage points of the all-game average.
SEASON PROFILES
If you prefer to watch games where the result is in doubt for the longest, you’d probably opt for games where the winning team had the smallest average margin across the game. As the chart below reveals, you tend to find these in games from clusters 10 and 11, and 17 through 20.
In a typical year, somewhere between about 23% and 39% of home-and-away games will be from one of those clusters. In Season 2019, 39% of games came from those clusters, which was the highest for any season. The low of 23% came in 2012.
The counts by cluster type for each season appear in the chart below.
TEAM PROFILE
To finish, let’s call the six clusters we identified as having the average margin for the winning team as the “close” clusters, and see what proportion of each teams’ wins and losses across the decade have come from these clusters.
For most teams, we see that between 25% and 40% of wins and of losses have come from the close clusters. The notable exceptions are:
Sydney and West Coast who’ve had relatively few wins from the close clusters
Hawthorn, Sydney, and West Coast, who’ve had a relatively large number of losses from the close clusters
Brisbane Lions and Western Bulldogs, who’ve had relatively few losses from the close clusters
SUMMARY
What we’ve found in the analysis for this blog is that there are 20 basic plots that describe the dynamics of the winning teams’ margin across all 1,946 home games from the seasons 2010 to 2019.
Again, I’d be keen to hear about any ideas you might have for further analyses using the score progression data.