Matter of Stats

View Original

What Do Seasons Past Tell Us About Seasons Present?

I've looked before at the consistency in the winning records of teams across seasons but I've not previously reported the results in any great detail. For today's blog I've stitched together the end of season home-and-away ladders for every year from 1897 to 2012, which has allowed me to create a complete time series of the performances for every team that's ever played.

In future blogs I'll investigate other metrics but, for today I'll focus solely on each team's winning percentage - the number of home-and-away season games it won plus one half of those it drew, divided by the total number of games it played during the home-and-away season.

On a technical note, when teams have merged or changed names I've treated them as different teams, and I've stubbornly retained my determination to call the Kangaroos 'the Kangaroos' for all of the seasons since they first used that name to 2012. Being Head (and only) Analyst has its privileges.

THIS SEASON'S WINNING PERCENTAGE COMPARED TO LAST YEAR'S

Let's start with the grandest historical sweep. 

Every dot on the chart at left represents a pair of winning percentages for the same team in two consecutive seasons. The blue line is a loess fit to the data and the grey shading provides the standard error of the fit, which is widest where there's least data and therefore most uncertainty.

What's blazingly obvious from this plot is the overall relationship between a team's winning percentage in successive seasons. The correlation is +0.58, which means that over one-third of the variability in a team's winning percentage in the current season can be explained by its winning percentage in the previous season.

More subtle perhaps is the sparseness of data points in the upper left and lower right corners of the chart hinting that teams very rarely go from truly awful to nearly unbeatable, or vice versa, from one season to the next.

It's axiomatic that, for things that are old, much of their history predates living memory. This is clearly true of the VFL/AFL competition, which means that much of the data on the preceding chart comes from earlier and different times - pre-Draft, pre-professionalism, and pre-TV; heck, pre most of us. So, perhaps this apparent and slightly surprising continuity in team performances from one season to the next is nothing more than an historical relic.

The chart at right shows that this is not the case. Each strip of the chart provides the relevant data for a particular era - the top chart spanning 1897 to 1909, the second 1910 to 1939, the next 1940 to 1959, the second-last 1960 to 1989, and the final strip 1990 to 2012. Each depicts a similar relationship in teams' performances from one season to the next.

To be completely accurate, I should note there has been a reduction in the size of the correlation across the eras, with a general tendency for decline, but the peak of +0.64 for the 1910 to 1939 era isn't that distant from the low for the most-recent era of +0.50. So, while the Draft might have reduced the level of season-to-season correlation in team performances, it surely hasn't eliminated it.

LONGER TIMEFRAMES

There are then, echoes of last season in this season's performances. What about echoes from the season before that?

Traces remain, but they're less evidentiary. In quantitative terms, the correlation falls to +0.44, meaning that only about 19% of a team's current performance can be explained by its performance two seasons ago.

While the data cloud on show here is more dispersed than the one shown earlier, the relative absence of rags-to-riches and riches-to-rags performances remains: there are remarkably few teams in the quadrant representing 75% or higher winning percentages in one season and 25% or lower winning percentages two seasons later, and equally few in the diagonally opposite quadrant. 

Turning around the performance of an especially weak team, or destroying the capabilities of a particularly strong one, generally takes more than a couple of seasons to achieve, it seems.

The era-by-era view again paints a similar perspective to the all-season view, one of a weaker but still present relationship.

We do see here, however, a far more dramatic reduction in the correlation for the most-recent era compared to earlier eras. For the 1897 to 1909 era, the correlation is +0.58, while for each of the three eras that follow it's in the +0.47 to +0.48 range. For the era from 1990 to 2012 it falls to +0.27, meaning that only about 7% of a team's performance in the current season can be explained by its performance two seasons prior.

Nonetheless, even in the data for the latest era we don't find a swathe of teams with huge swings in winning percentages from one season to the next after next. What we see instead is a general compression of team winning percentages into the 25% to 75% range, and teams bouncing around less-predictably within that narrower range.

Finally, if we go back one season further - that is, if we compare the current season performance with that from three seasons prior - we find weaker but still non-zero relationships.

(I've made these charts smaller for space reasons. Please click on them for larger versions.)

The overall correlation is now +0.33 and the era-by-era correlations range from a peak of +0.51 for the earliest era to a low of +0.14 for the latest era. The conclusion then is that in modern times, by the time we're three years distant from a season, there's not much it can tell us about the present season. The competition has a memory, but it's more goldfish than elephant. (And, yes, I know that the popular belief about the memory span of goldfish is apocryphal.)

(This result, by the way, is consistent with the earlier blog linked to in the first paragraph of this blog, in which we found that predicting teams' current performances requires only that we use performances from last season and the season before that.)

A TEAM PERSPECTIVE

What's true for the competition as a whole need not be true of individual teams. Here we'll first take a look at the charts for the current season versus the immediately preceding season for various teams, current and historical, taken six at a time.

(Again, please click on them for larger versions.)

With a few obvious exceptions, the general picture is that every team exhibits some level of correlation between current season and previous season performance. A quantification of this correlation, as well as the correlations between seasons separated by more than a single off-season, appears in the following table.

Hawthorn has been the team whose most recent historical performances have been the best guide to its future. Even as much as three seasons apart, the correlation in their season-long winning percentages has remained at +0.60. If you're a Hawks fan that's fantastic at times like the present when they're doing well, but more disconcerting when they've spent protracted periods at the other end of the competition ladder.

The Saints, Tigers and Blues are also models of consistency, each with this season versus next season correlations of +0.60 or higher. Unfortunately, for the Saints much of that consistency has been when their winning percentages have been on the wrong side of 50%.

Amongst current teams with a reasonably large sample of seasons from which to draw conclusions, the Pies and the Dons show the greatest levels of season-to-season variability in performances. For both teams only about 10% to 20% of their current season performance can typically be explained by their performance in the previous season and, by the time a season is three years old, it holds virtually no predictive value in terms of their current season performance.

Only the Footscray team of the past shows anything like this level of capriciousness of performance over an extended period. The Crows, Roos, Freo and the Lions, however, have all made impressive starts in establishing far more unpredictable histories. 

To finish, here are the team-by-team time series of winning percentages.

 (The GWS chart is conspicuously free of data because it's participated in only a single season of football, which precludes any summarisation of its history by way of a line. Just imagine that there's a single point at 9.1% in 2012 for them.)