WHAT CAN MOSHBODS’ TEAM RATINGS AND VeNUE PERFORMANCE VALUES TELL US ABOUT TEAMS’ CHANCES oF WInning?
For this analysis we'll ask a simple question: how do teams' winning rates vary with their MoSHBODS Offensive and Defensive Ratings and their Venue Performance Values, and how has the relationship varied from era to era?
THE DATA
To investigate this, we'll split the entire history of the V/AFL up into 5-season eras (including a slightly longer inital 1897 to 1904 era) take the pre-game Offensive and Defensive ratings and Venue Performance Values of every team in every game and then fit a binary logit to the outcomes from the home team’s point of view.
(Note, firstly, that determining home team status can be a vexed issue in the case of games played overseas or when a team sells a home game and so ends up playing at an away venue for a designated home game. For the most part, I believe I’ve used the AFL’s designation to determine home status, but the issue is deserving of further analysis. One day …
Also note that, because I’m fitting binary logits, draws are problematic, so those games are excluded from the analysis).
For the purposes of model-fitting and performance analysis, each era’s data will be randomly split 50:50 into a training and a testing set.
THE MODEL
We could fit a variety of classification models to the home win / home loss data we have, but for now we’ll settle for:
glm(Home Result ~ Home Offensive Rating + Home Defensive Rating + Away Offensive Rating + Away Defensive Rating + Home Venue Performance Value + Away Venue Performance Value, family = “binomial”)
To ascertain the relative importance of each input in estimating home teams’ winning chances in a particular era, we’ll use the varImp function from the R caret package on the model for that era, and standardise the importance values such that the most important variable has an importance of 1.
To measure the performance of the models we’ll use the following measures from the MLmetrics package:
Accuracy - % of home and away winners correctly predicted
AUC - measure of a model's ability to differentiate between home wins and away wins. Represents the probability that the model, if given a randomly chosen positive and negative example, will rank the positive higher than the negative
LogLoss - mean value of 1+log(probability attached to winning team [base 2])
F1 Score - 2 x Correctly predicted away wins/(2 x Correctly predicted away wins + Wrongly predicted away wins + Wrongly predicted home wins) = 2 x (Precision x Recall)/(Precision + Recall). Considered a single measure of model performance for the away win class (see this link)
Precision (also Positive Predictive Value) = proportion of predictions of away team win that are correct
Specificity - proportion of all home wins correctly predicted
Sensitivity (also Recall) - proportion of all away wins correctly predicted
We apply these metrics to the predictions for the training and test samples, and use any significant difference between the same metric when used for the training versus test sample as an indicator of potential overfitting, and the results for the test sample alone as an indication of the general worthwhileness of and given model for an era.
RESULTS
The results of the analysis are summarised in the table below (which can be clicked to access a larger version)
Observations on the data
We have at least 250 games in every defined era, so we are virtually assured of having quite precise estimates of model coefficients and derived values
Home teams have won between 57% and 64% of all decided games in an era. It was at 63% in the 1995-1999 era and has fallen or remained constant in the five eras since, so much so that we might be heading for an all-time low.
Observations of model performance
Comparing firstly the performance metrics when measured in-sample versus on the test set, we see no obvious signs of massive overfitting in any era and certainly not overall. In general though, as we’d expect, the performance metrics are less-impressive for the test data than they are for the training data, especially in the 1990 to 2009 period.
The models seem to particularly struggle with forecasting away team wins in the test data (relative to how well they do on the training data for the same period) in the 2000 to 2009 period. That could be a flaw in MoSHBODS, a reflection of the true difficulty of forecating away wins for the particular games in the test set in those era, or a combination of both.
Overall, I think it’s fair to conclude that the models generally do well enough on test data that we can take at least some notice of them when it comes to which variables best predict home wins.
Observations of variable performance
One of the most striking features of the table is the number of eras for the Home Team Defensive Rating carries a high or the highest variable importance value of 1. It’s only in the 1970 to 1974 era where it is relegated to minor importance.
The three other team rating-related inputs vary in importance from era to era, but overall wind up being about equally important across the entire history of the sport, with Away Team Defensive Rating generally slightly more important than the others.
Interestingly, Venue Performance Values (which are just a logical extension of measures of Home Ground Advantage) are only, on average, about one-quarter as important as Home Team Defensive Ratings, although Home Team VPVs have spiked in importance in recent times (potentially, I would suggest, because the gaps in underlying ability between most teams have been reducing)
IMPLICATIONS AND FINAL THOUGHTS
The first thing to stress is that there is no guarantee that the findings here reflect the reality of the V/AFL. They will only do so to the extent that the MoSHBODS System appropriately models all points in V/AFL history.
If we take that as given, however, for our current purposes, then a clear finding of the analysis here is that teams’ defensive ability have played a substantially bigger role in their success or failure than have their offensive abilities across the majority of eras in V/AFL football.
The other significant finding, I’d suggest, is that the impact of game venuing is somewhat smaller than I’d have though pre-analysis. That finding too though, of course, is dependant on the adequacy of the MoSHBODS System.