Really Simple Margin Predictors : 2013 Review

September 29, 2013 Tony Corke

MAFL's two new Margin Predictors for 2013, RSMP_Simple and RSMP_Weighted, finished the season ranked 1 and 2 with mean absolute prediction errors (MAPEs) under 27 points per game. Historically, I've considered any Predictor I've created as doing exceptionally well if it's achieved a MAPE of 30 points per game or less in post-sample, live competition. An MAPE of 27 is in a whole other league.

Of course, 2013 was an unusually predictable season, so some allowance should be made for that fact, but the RSMP Predictors outperformed even the best of the TAB Bookmaker-based Predictors, Bookie_LPSO, in RSMP_Weighted's case by almost half a point per game.

To recap for a moment (for the full details, please refer to this blog), both Really Simple Margin Predictors (RSMPs) are ensemble predictors (or, more correctly, multiple classifier systems) whose base learners are simple predictors based on a single variable such as the Home team's TAB Bookmaker price, the Away team's MARS Rating, or some transformation of a similar variable. Both RSMPs in MAFL have access to the same set of base learners, with RSMP_Simple weighting each base learner equally and RSMP_Weighted applying differential weights optimised on the basis of historical data. The optimisation process for RSMP_Weighted resulted in it ignoring some of the base learners altogether - or, more politely, it assigned them zero weight. ("It's not that I don't value your opinon, it's just that I tend to assign it zero weight ...")

An effective ensemble forecaster produces better forecasts than any of its constituent base learners. RSMP_Weighted did this in 2013, though only just.

The black line in the chart at left tracks the progress of RSMP_Weighted relative to whichever base learner was best performed at the relevant stage of the season - which, for large parts of it, was the learner using only the Home team's price (RSMP_Home_Price) - and shows that RSMP_Weighted's performance was superior from the 66th game of the season onwards.

RSMP_Weighted very nearly fell behind at the end of game 109 when it led RSMP_Home_Price by only 0.00112 points per game, and it finished the season in front by only 0.0667 points per game, but it nonetheless led all-comers for the final 140-odd games of the season, which is all that can reasonably be required of it.

RSMP_Simple fared less well, only occasionally outperforming the best of the base learners at the time, and finishing the season with a 0.2991 points per game poorer MAPE than RSMP_Home_Price. Clearly, weights are important (including the zeroes).

RSMP_Home_Price was the best-performed base performer from game 177 to the end of the season having been led for just a single game by the RSMP based on the TAB Bookmaker's Risk-Equalising Home team Probability (RSMP_RE_Prob). Prior to that RSMP_Home_Price had led since game 135.

There are a few things I find interesting here about the relative performances of the base learners. Firstly, the difference in the performances of RSMP_Home_Price and RSMP_Away_Price are notable. There is, it seems, a lot more predictive information in the Home team's price on the TAB than in the Away team's, a fact that I've discovered in previous modelling endeavours and which is due, I suspect, to the relatively constant level of overround in the price of Home teams and the highly variable level of overround in the price of Away teams. This is why Bookie_LPSO has been such a successful Margin and Head-to-Head Probability Predictor.

Also, MARS Ratings-based learners are consistently poorer performers than learners based on TAB Bookmaker information. As we saw in a recent blog, the predictive efficacy of team MARS Ratings is significantly enhanced by including information about the teams' recent changes in those Ratings. The base learners used in RSMP do not include that information.

An analysis of the correlation structure of the base learners with RSMP_Weighted and RSMP_Simple is instructive.

RSMP_Weighted's game-by-game predictions this season have been most correlated with RSMP_Home_Price, RSMP_LO_RE_Prob, RSMP_RE_Prob, RSMP_LO_RE_Prob and RSMP_LO_OE_Prob_2, which are the top 5 base learners in terms of MAPE. What's a little surprising about this outcome is that only one of those base learners has more than a 10% weighting in RSMP_Weighted and one of them, RSMP_LO_OE_Prob_2, has a zero weighting.

In comparison, RSMP_Simple suffered from its relatively low correlation with RSMP_Home_Price. The correlation RSMP_Simple had with RSMP_Home_Price was the second lowest correlation it had with any base learner.

RSMP_Weighted applied a weighting of 40% to RSMP_Home_Price this year, while RSMP_Simple applied a weighting of just 8.5%. A post-hoc analysis of base learner predictions and actual game margins reveals that an optimal weighting would have been 53.6%, which is much closer to RSMP_Weighted's figure.

The full set of weightings for that post-hoc optimised weighted learner are as follows:

RSMP_Home_Price : 53.6%
RSMP_RE_Prob : 25.9%
RSMP_MARS_Ratio_2 : 13.1%
RSMP_MARS_Ratio : 7.4%

An ensemble predictor using these weightings would have achieved an MAPE of 26.27 points per game for season 2013, which is about 0.15 points per game better than RSMP_Weighted. Hindsight is always, of course, a fine thing indeed.