Team Rating Revisited: A Rival for MoSSBODS

January 6, 2017 Tony Corke

Last year, predictions based on the MoSSBODS Team Rating System proved themselves to be, in Aussie parlance, "fairly useful". MoSSBODS correctly predicted 73% of the winning teams, recorded a mean absolute error (MAE) of 30.2 points per game, its opinions guiding the Combined Portfolio to a modest profit for the year. If it had a major weakness, it was in its head-to-head probability assessments, which, whilst well-calibrated in the early part of the season, were at best unhelpful from about Round 5 onwards.

As the 'SS' in its name reflects, MoSSBODS is based solely on teams' scoring shot performances. It doesn't distinguish between goals and behinds, so a team registering 15.2 is considered to have performed equally as well as one that registered 7.10 against the same opposition. Its Ratings are also measured in terms of scoring shots - above or below average - and so, for example, a team with a +3 Offensive Rating is expected to create 3 more scoring shots than the (previous season) all-team average when playing a zero-rated team at a neutral venue.

The rationale for using a team's scoring shots rather than its score in determining ratings is the fact that a team's accuracy or conversion rate - the proportion of its scoring shots that it converts into goals - appears to be largely random, in which case rewarding above-average conversion or punishing below-average conversion would be problematic.

Conversion is not, however, completely random, since, as the blog post just linked reveals, teams with higher offensive ratings, and teams facing opponents with lower defensive ratings, tend to be marginally more accurate than the average team.

So, if better teams tend to be even slightly more accurate, maybe higher accuracy should be given some weighting in the estimation of team ratings.

Enter MoSHBODS - the Matter of Stats Hybrid Offence-Defence System.

MoSHBODS Mechanics

Fundamentally, MoSHBODS is exactly the same as MoSSBODS. It too is an ELO-style rating system and has the same set of underlying equations:

New Defensive Rating = Old Defensive Rating + k x (Actual Defensive Performance – Expected Defensive Performance)
Actual Defensive Performance = All-Team Average Score – Adjusted Opponent’s Score
Expected Defensive Performance = Own Defensive Rating – Opponent’s Offensive Rating + Venue Adjustment / 2
Adjusted Opponent’s Score = f x Opponent's Score if Converted at All-Team Average + (1-f) x Actual Opponent's Score
New Offensive Rating = Old Offensive Rating + k x (Actual Offensive Performance – Expected Offensive Performance)
Actual Offensive Performance = Adjusted Own Score - All-Team Average Score
Expected Offensive Performance = Own Offensive Rating – Opponent’s Defensive Rating + Venue Adjustment / 2
Adjusted Own Score = f x Own Score if Converted at All-Team Average + (1-f) x Actual Own Score
Venue Adjustment = Own Venue Performance Value - Opponent's Venue Performance Value

Also, just like MoSSBODS:

Teams are given initial Offensive and Defensive Ratings of 0 prior to their first game.
Teams carry 70% of their Rating from the end of one season to the start of their next season.
Sydney, in its first year, is considered to be a continuation of South Melbourne, the Western Bulldogs a continuation of Footscray, and the Brisbane Lions a continuation of Fitzroy (the Brisbane Bears therefore being assumed to disappear for Ratings purposes at the end of 1996). The Kangaroos and North Melbourne are also treated as the same team regardless of the name used in any particular season.
In equations (2) and (6), the All-Team Average Score has been set as the average score for all teams in all games from the preceding season.
In all other equations, acceptable parameter values have been determined using the entire history of V/AFL football, broadly seeking to minimise the Mean Absolute Error (MAE) in predicted game margins relative to actual game margins. In selecting parameter values, round numbers were preferred to non-round numbers (eg a Carryover of 70% would be preferred to, say, 68.2%) and better fits to later eras (the last 15 years) were preferred to better fits for earlier eras, provided that this could be achieved without damaging the all-time fit "too much". If that sounds a little vague, that's because it is. I don't think it's ever possible - and certainly not desirable - to entirely separate the model from the modeller. (Some unkind people would suggest that's why so many of my models seem complex and ugly ...)

For MoSHBODS, the year 2016 was used as the holdout year, so no optimisation was performed using data from that year.

MoSHBODS also, as MoSSBODS, splits the season into five sections, dividing the home-and-away portion into four unequal pieces, and placing the finals into a fifth portion of its own. The optimal k's for each of those pieces for MoSHBODS are as follows:

k = 0.13 for Rounds in slightly less than the first one-third of the Home and Away Season
k = 0.09 for Rounds in slightly less than the second one-third of the Home and Away Season
k = 0.08 for Rounds in slightly less than the final one-third of the Home and Away Season
k = 0.09 for (generally) the last 3 rounds of the Home and Away season
k = 0.06 for Finals

MoSHBODS has a flatter profile of k's than does MoSSBODS for all but the earliest portion of the home-and-away season. It has a higher k, however, for that early portion, which allows its ratings to more rapidly adjust to the revealed team abilities of the current season.

ADJUSTED SCORES

Equations (4) and (8) are a little different for MoSHBODS and are the reason for its "hybrid" designation. Their role is to adjust a team's actual score to make it a mixture of:

The score they actually recorded
The score they would have recorded had they converted their scoring shots at the all-team average rate from the previous season

The value of f in the equation determines the extent to which the adjusted score is dragged away from the actual score, with larger values producing larger differences. We can think of MoSSBODS as having an f of 1 since it puts no weight at all on a team's actual score and looks only at the number of scoring shots it registered. The optimal value of f for MoSHBODS has been determined to be 0.65, so it takes some account of a team's actual score, but places almost twice as much weight on a team's scoring shot production.

Let's work through an example to see how this works in practice.

Consider a team that kicked 15.4.94 to its opponent's 9.10.64. Assume that the all-team conversion rate in the previous season was 53%.

The team's adjusted score would therefore be:

0.65 x (53% x 19 x 6 + 47% x 19 x 1) + 0.35 x 94 = 65% x 69.35 + 35% x 94 = 78 points

That 78 point figure is a mixture of the 69.4 points the team would have scored if they'd converted their 19 scoring shots at 53% rather than at 79%, and of the 94 points that they did score. Overall, their score is reduced by about 16 points because they converted at an exceptionally high rate. They still, however, receive about a 9 point credit for their above-average accuracy.

Their opponent's adjusted score would be:

0.65 x (53% x 19 x 6 + 47% x 19 x 1) + 0.35 x 64 = 65% x 69.35 + 35% x 64 = 67 points

Their 67 point figure is a mixture of the 69.4 points the team would have scored if they'd converted at 53% rather than 47%, and the 64 points that they did score. Their score is increased by 3 points because they converted at a rate marginally lower than the expected rate of 53%. That is, though, about a 2 point penalty relative to what they'd have been credited with if the average conversion rate was used.

In this example then, the 30-point actual win is reduced to an 11-point win once the adjustments have been made. Note that MoSSBODS would have treated this game as a 19-all (scoring shot) draw.

It's a subjective call, but it feels to me as though MoSHBODS' assessment is more appropriate here.

Across the entirety of V/AFL history, the MoSHBODS approach changes the result (ie changes the winning team, switches a draw to a result, or switches a result to a draw) in only 9% of games. MoSSBODS changes the result almost twice as often.

VENUE PERFORMANCE

MoSHBODS also estimates Venue Performance Values (VPVs) for every team at every ground, but does not use a separate Travel Penalty, instead allowing the Venue Performance values to absorb this component. This makes the VPVs very straightforward to interpret.

Another benefit of this approach is that it allows the effects of interstate travel to vary by team and by venue. MoSSBODS, by comparison, simply imposes a fixed 3 scoring shot penalty on any team playing outside its home state against a team playing in its home state. The apparent variability of the effects of travel on different teams to different venues is clearly reflected in the matrix below, which is MoSHBODS' VPVs for current teams and current venues as at the end of the 2016 season.

Numbers in this table can be interpreted as (a regularised estimate of - see below) the number of points the team would be expected to score above or below what would be implied by the difference between its own and its opponent's offensive and defensive ratings. We can think of these numbers as a logical extension of the home ground advantage notion, but one that recognises not all away grounds are the same for every team.

As we might expect, most teams enjoy positive VPVs at their home ground or grounds. These values are shaded grey in the table. Adelaide, for example, enjoys a +5.1 VPV at the Adelaide Oval, and Port Adelaide a +5.0 VPV at the same ground.

The only teams with negative VPVs at their home grounds are Essendon (-0.9 at the MCG and -0.4 at Docklands), and Richmond (-0.8 at the MCG).

We can see the differential effects of travel and interstate venues on teams by scanning down the columns for each team. Adelaide, for example, faces a -7.1 VPV at the Gabba and a -7.0 VPV at Kardinia, but actually enjoys a +1.5 VPV at Docklands.

The process for estimating VPVs is as follows.

Firstly, VPVs are assumed to be zero for any venue at which a team has played 3 times or fewer. Thereafter, the VPV is calculated as a fraction of the average over- or under-performance ("excess performance") of the team at that venue across the (at most) 100 most recent games.

A team's excess performance is defined as the difference between the actual adjusted margin of victory and the margin of victory we'd have expected based solely on the ratings of the teams involved (ie without adjusting for venue effects).

These excess performances are averaged and then damped (or, in the lingo, "regularised") by taking 45% of the result. This serves to prevent extreme VPV values for venues where teams have played relatively few games with very unexpected results, and reduces the impact of such extreme results even at venues where a team has played regularly.

Again, a small example might help.

Imagine a team that has played 4 games at a particular venue:

Game 1: Adjusted Margin: -2.7, Expected Margin based on team ratings only +5.3
Game 2: Adjusted Margin: +32.7, Expected Margin based on team ratings only +1.2
Game 3: Adjusted Margin: -83.6, Expected Margin based on team ratings only -12.2
Game 4: Adjusted Margin: +33.5, Expected Margin based on team ratings only +22.6

The excess performance values are thus -8.0, +31.5, -71.4 and +10.9, and their average is -9.25. We take 45% of this to obtain a VPV of -4.2 for this team at this venue.

CONVERTING MARGINS TO PROBABILITIES

MoSSBODS uses a single logistic equation to convert expected victory margins, measured in terms of scoring shots, into victory probabilities. In this respect it probably suffers a little from changes in the typical number of scoring shots recorded in games from different eras, which might make the value of a (say) +4 SS advantage worth more or less at different times.

MoSHBODS uses the insight from this earlier blog on the eras in V/AFL football to split history into six eras:

1897-1923
1924-1950
1951-1961
1962-1977
1978-1995
1996-2015

A separate logistic equation is then fitted to each era (2016 is used as a holdout for the final, current era).

The efficacy of this approach is borne out by the variability in the fitted exponents across the eras, which range from a low of 0.0425 for the 1978 to 1995 period, to a high of 0.0649 for the 1897 to 1923 period. This means, for example, that a predicted 12 point win would be mapped to a 68.5% probability in 1900, but a 62% probability in 1990.

More broadly, as you can see from the diagram below, larger expected victory margins have been associated with smaller victory probabilities as the average levels of scoring have increased over time.

BIAS ADJUSTMENT

It's curious to me, but all of my ELO rating models have consistently underestimated game margins from the home team's perspective. I feel as though there's something profound about this, but the source of this profundity continues to allude me. In any case, for MoSHBODS, the bias amounts to about 2 points per game.

Analysis shows that this bias stems from overestimating the expected score of away teams during the home-and-away season, so the final adjustment we make in converting MoSHBODS ratings into team score and margin predictions is to subtract a fixed 2 points from the designated away team's score for all home-and-away season games.

Even more curiously, MoSHBODS' total score predictions are superior without this bias adjustment, so this year we'll find that MoSHBODS' official predictions will not add up, in the sense that the predicted home score plus the predicted away score will not match the predicted total score.

Odd, I know, but it turns out that sometimes the best estimator of (A+B) isn't the best estimator of A plus the best estimator of B. Again, seems profound; can't find the nature of it.

PERFORMANCE

So, is MoSHBODS better than MoSSBODS?

Slightly, but demonstrably.

Specifically:

It has a higher LPS in 66% of seasons, but only in 41% of the seasons since 2000 (though its season average LPS is marginally higher than MoSSBODS' across that period)
It has a smaller MAE for Home team scores in 64% of seasons, and in 53% of the seasons since 2000
It has a smaller MAE for Away team scores in 62% of seasons, and in 53% of the seasons since 2000
It has a smaller MAE for game margins in 60% of seasons, and in 53% of the seasons since 2000
It has a smaller MAE for total scores in 65% of seasons, and in 65% of the seasons since 2000

As well, in the only season that is post-sample for both MoSHBODS and MoSSBODS, 2016, MoSHBODS outperformed MoSSBODS on all five of these measures, in many cases by a significant margin.

The comparative results were:

LPS of Victory Probabilities: +0.2341 (MoSSBODS +0.2059)
MAE for Home Team Score: 18.75 points (MoSSBODS 19.14)
MAE for Away Team Score: 19.36 points (MoSSBODS 20.07)
MAE for Margin: 29.07 points (MoSSBODS 30.22)
MAE for Total Score: 24.73 points (MoSSBODS 24.80)

Past performance, of course, is no indication of future performance. But, I'm encouraged ...

CURRENT TEAM RATINGS

Here are MoSHBODS' team ratings as at the end of the 2016 season.

These ratings have the teams ranked quite similarly to MoSSBODS rankings. The major differences are that:

MoSHBODS ranks the Western Bulldogs two places lower on Offence
MoSHBODS ranks West Coast two places higher on Offence
MoSHBODS ranks Richmond two places higher on Offence
MoSHBODS ranks Port Adelaide two places higher on Defence

No other ranking is more than a single place different for MoSSBODS and MoSHBODS.

WHAT NEXT?

If nothing else, MoSHBODS' predictions will be presented along with MoSSBODS' for season 2017. I've not yet decided whether to use it to inform the three Funds, though that is a consideration, even if only for the Head-to-Head Fund.

I'm genuinely looking forward to see how MoSHBODS performs relative to MoSSBODS and the other MoS Tipsters and Predictors.

More news as it comes to hand ...