The Dynamics of ChiPS Ratings: 2000 to 2013

January 10, 2014 Tony Corke

Visitors to the MatterOfStats site in 2014 will be reading about ChiPS team Ratings and the new Margin Predictor and Probability Predictor that are based on them, which I introduced in this previous blog. I'll not be abandoning my other team Ratings System, MARS, since its Ratings have proven to be so statistically valuable over the years as inputs to Fund algorithms and various Predictors, but I will be comparing and contrasting the MARS and the ChiPS Ratings at various times during the season.

What I like about ChiPS Ratings is their natural interpretation: the difference between the ChiPS Ratings of two teams is a direct indicator of the difference in their abilities measured in terms of points-scoring. MARS Ratings do not share this interpretation, which is why any Margin Predictor I've created using MARS Ratings (for example, this one) has usually applied some deflation factor to team Ratings or the difference in them.

As an example of what this means for ChiPS Ratings we could say that a team ChiPS Rated at 1,020 is a 20-point better team than another Rated 1,000 and would be expected to defeat such a team by 20 points before we adjust for any home ground advantage (HGA), relative difference in recent form, and the Interstate Status of the contest.

In this blog I'll be examining the characteristics of team ChiPS Ratings over the period 2000 to 2013. (Although the Ratings were created using the entire period from 1999 to 2013, I've excluded the first season from this analysis to minimise the potentially distorting influence of that season given that it was a "calibrating" year where all teams' Ratings were initially set to 1,000. On this same basis a case could be made for excluding 2011 and 2012 as these were similarly calibrating years for the Gold Coast and GWS, but I've retained these seasons in the analysis because the distortion in those seasons affects, at most, a single team.)

Team Ratings

Here, firstly, are the team Ratings as at the end of last season.

The first column of data is the teams' ChiPS Rating at the end of the home and away portion of the 2013 season and reveals that Hawthorn were a 6 point better team than any other team in the competition according to ChiPS at that point, and that GWS were about a 13 point worse team than any other team and a whopping 94 point worse team than the Hawks.

Comparing these end of regular season rankings with the teams' final ladder positions - ignoring Essendon's notional demotion to 9th by the AFL - we find a high level of agreement, the notable exceptions being the Kangaroos, who are ranked 3rd by ChiPS but finished 10th on the ladder, Fremantle, ranking 7th but laddering 3rd, and Essendon, who ChiPS ranked 11th but who ended the season 7th on the ladder, supplement scandals aside.

On the right-hand side of the table I've shown the team Ratings after the Grand Final. Because ChiPS applies a very small multiplier to the difference between the Actual and Expected margin in Finals - only about one-fifth to one-sixth of that it uses for other portions of the season - team Ratings change very little over the 4 weeks of the Finals campaign, with no team altering its Rating by more than a single Rating Point (RP). The only changes in team rankings as a result of these Rating changes were swaps between Collingwood and Richmond for 4th and 5th, and between Sydney and Fremantle for 6th and 7th.

The Finals did little to reduce the level of congestion towards the top of the Ratings ladder, with only 5 RPs separating the 2nd-placed Cats (1,025.2) from the 7th-placed Swans (1,020.4). And there's more congestion further down the list with only about 10 RPs separating the 10th-placed Port Adelaide (996.6) from the 15th-placed West Coast (986.3).

Outside of those two clusters of teams, the ChiPS Ratings suggest that the Hawks stand alone at the top, that Adelaide and Carlton form a very small island between The Talented and The Mediocre, that Gold Coast is slightly adrift but only a few solid games away from joining the latter, and that the Dees and the Giants are each an island in themselves bobbing around casting for wins while waiting for a draft (Worst. Pun. Ever.) to propel them somewhere better.

Looking back across the period from 2000 to 2013 we see ebbs and flows in the Ratings profile for each team similar to those we've seen in charts based on MARS Ratings.

One of the more striking aspects of this chart for me is the narrow bands in which the Swans' and the Roos' Ratings have tracked across the entire period and the comparative boom-and-bust nature of the Ratings for other teams such as Carlton, Essendon, St Kilda, West Coast and the Western Bulldogs. That said, it's generally true that team ChiPS Ratings bounce around within a narrower range than do team MARS Ratings, as the following chart makes readily apparent.

What we see in this chart is that, often, teams Rated above 1,000 tend to be Rated more highly by MARS than by ChiPS, while for teams Rated below 1,000 the opposite is true. The two Rating Systems produce highly correlated Ratings, however (despite the fact that GWS' Rating is initially set to 900 and not 1,000 in the MARS Rating data), and the equation:

ChiPS Rating = 352.6 + 0.6477 x MARS Rating

has an R-squared of 91.6%.

The Distribution of Rating Changes per Game

To get a feel for the size of the typical Rating change that a single game is likely to precipitate, let's firstly look at the distribution of these changes across all games from 2000 to 2013.

Since ChiPS Rating changes in any game are zero sum - one team's gain is its opponent's loss - the distribution is symmetric and zero mean by design. The standard deviation of the distribution is 2.92 RPs per game and, while it might look Normal, its kurtosis is 3.88, which makes it somewhat more peaked than a Normal. In other words, relative to the Normal distribution there are fewer extreme ChiPS Rating changes per game than we'd expect. (An Anderson-Darling test for Normality comfortably rejects the null hypothesis that the Rating changes are Normally distributed. The p-value is so small there are people still looking for the first 1 in its binary representation.)

ChiPS applies different multipliers to game results at different points in the season, and the colour-coding above reflects that compartmentalisation. The small multiplier used for Finals is reflected in the narrow range of values for which there are pink section of the bars, and the larger multipliers used for Rounds 1 to 6 and Rounds 18 onwards in the home and away season are reflected in the broader range of values spanned by the peach and blue sections of the bars.

I've created the following chart to make this result a little clearer.

The only portions of the season that have produced games leading to Ratings changes exceeding 10 RPs have been Rounds 1 to 6, which has spawned five such games, and Rounds 18 onwards in the home and away season, which has delivered just two. All seven games of these games have been exceptionally high-scoring games, with the winner scoring about 150 points or more and the loser scoring about 35 to 65. Richmond has been on the wrong side of two of the games, losing 167-52 to the Dogs in R1 of 2006 and 222-65 to the Cats in R6 of 2007. The Gold Coast has also been a major Ratings loser in one game having been thumped 171-52 by the Blues in R2 of 2011.

Melbourne has also made two inglorious appearances in the list courtesy of a 233-47 drubbing at the hands of the Cats in R19 of 2011 and a 184-36 battering by the Dons in R2 of 2013. Collingwood and GWS have made one appearance each, the Pies' coming as a 149-53 loss to the Cats in R24 of 2011, and the Giants' as a 183-54 defeat by the Roos in 2012.

Differentiating Ratings changes on the basis of designated home and away team status we find that, across the 14 seasons there's been very little difference in the profile of the Rating changes for the two groups.

The average Rating change for designated home teams has been -0.015 RP per game with a standard deviation of 2.92 RP per game, while that for away teams has been +0.015 RP per game with the same standard deviation as for home teams. This can be interpreted as implying that, overall, ChiPS Ratings have produced margin predictions that have been very close to unbiased when viewed on a home team/away team basis.

Home teams have, however, fared a little better in games from Rounds 12 to 17 where they've, on average, gained 0.08 RPs per game, and in Finals, where they've gained 0.05 RPs per game. Away teams have fared best in Rounds 7 to 11 where they have, on average, gained 0.11 RPs per game.

Moving next to a season-by-season view we see some distinct differences in the Rating change distributions for some.

Seasons 2000 and 2011 produced the broadest spread of Rating changes, the standard deviations for these seasons being 3.24 and 3.23 RP per game respectively. The narrowest spreads of Rating changes were produced in 2003 (2.59 RP per game) and 2009 (2.64 RP per game). The peakedness of the distributions for every year is greater than that associated with a Normal distribution (ie the kurtoses are all greater than 3) though the 2001 distribution is only very marginally more peaked than a Normal and does not, in fact, allow us to reject the null hypothesis of Normality using the Anderson-Darling test.

The mean, standard deviation, skewness and kurtosis of Rating changes for various subgroups of games are recorded in the following table.

One aspect of this table worth noting is the relatively narrow range of standard deviations shown, regardless of whether we partition games on the basis of season, home team/away team status, or team. (As noted earlier, the different multipliers attached to the different portions of the season results in markedly different standard deviations if we use Round as the basis of partitioning, however.)

These standard deviations range from about 2.8 to 3.2 RPs per game. If Rating changes were Normally distributed we could then use the rule of thumb that the median absolute deviation (MAD) equals about two-thirds of the standard deviation to infer that the median absolute Rating change would be about 2 RP per game. Given the extra peakedness of the Rating change distributions we might expect the actual MAD would be a little less than 2 RP, but it's nonetheless a useful heuristic to assume that about half the Ratings changes we'll see in practice will be smaller than 2 RPs and about half will be greater. (The actual median is about 1.75 RPs).

I'll finish by providing the Ratings change data on a team-by-team basis when they're playing at home.

Rating changes only come about at the end of a game because there is a difference between the Expected and the Actual game margin. This difference can be at least partially attributed to errors in the Ratings of the participating teams or in the adjustments made for recent form, home ground advantage or Interstate Status. Ideally, if we were to look at these errors for a particular team we'd prefer them to be small and to average out to roughly zero.

We therefore might assess the relative accuracy with which ChiPS has predicted the game margins for a team by combining the mean and standard deviation of the Rating changes for that team. That leads us fairly naturally to the root mean square error measure:

Using the values provided in the earlier table we can calculate the RMSE using this equation for each team and determine how accurate ChiPS has been in estimating each team's final game margins.

On this metric, ChiPS has been best at predicting the margins for Sydney home games and worst at predicting them for GWS home games. Looking back at the earlier chart of Rating changes by team, this is reflected in the relatively narrow distribution for Sydney, which has a mean near zero (+0.22 RPs), and in the very flat and dispersed distribution for GWS, which also has a mean some distance from zero (-1.89 RPs).

The distribution for the Gold Coast is quite similar to that for GWS, though the mean is nearer zero (-0.76 RPs) and the standard deviation is a little larger (3.25 vs 3.18 RPs). Combined, the bias and variance of the Rating changes for Gold Coast are slightly smaller than those for GWS.

Contributing to the relatively poor predictions for GWS and Gold Coast is the fact that the period we're considering spans their introduction into the competition at which time they were assigned an initial Rating of 1,000 by ChiPS (as were the other teams, but in 1999, the season I've excluded from the analysis here). As well, the absence of Finals experience for both these teams means that neither has played in that part of the season where ChiPS Rating changes tend to be smaller.

GWS, Gold Coast and Sydney aside, however, it's interesting to note how similar are the RMSEs for all other teams. They range from Brisbane's 2.79 RPs to Carlton's 3.09. This suggests that the ChiPS System works roughly equally well in estimating the merits, and hence results, for all 15 of these teams.