Specialist Margin Prediction: Epsilon Insensitive Loss Functions

February 16, 2012 Tony Corke

In the last blog we looked at Margin Prediction using what I called "bathtub" loss functions, which were prediction algorithms built using loss functions, for given M, of the form:

If Abs(Predicted Margin - Actual Margin) < M then Loss = 0, otherwise Loss = 1 (where Abs() is the absolute value function)

In words, these algorithms incurred a loss of 1 whenever their predictions were, in absolute terms, different by M points or more from the actual margin. Otherwise, they incurred no loss. To borrow from Orwell, "Less than M points error: good; M points or more error: bad".

We found, among other things, that the algorithm built for M=5 was particularly good at producing predicted margins within 12 points of the actual margin, a feat that it achieved for almost 28% of games in the holdout sample.

For the current blog I've extended the range of loss functions to include what are called epsilon-insensitive loss functions, which are similar to the "bathtub" loss functions except that they don't treat absolute errors of size greater than M points equally. Instead, they use the following:

If Abs(Predicted Margin - Actual Margin) < M then Loss = 0, otherwise Loss = Abs(Predicted Margin - Actual Margin)

So, when an algorithm's absolute prediction error is greater than M, the loss it incurs is directly proportional to its absolute prediction error. Otherwise, its loss is zero.

As well, since I was already considering some new loss functions, I thought I'd add a few more, specifically those of the form:

Loss = (Predicted Margin - Actual Margin)^k, for different k, and
Loss = Abs(Predicted Margin - Actual Margin)^k, for different k

For these loss functions, every point of error matters; what differentiates them is the rate at which loss increases with every additional point of error.

Again I determined how well that optimised (using Eureqa) non-linear Margin Prediction models, with bookmaker prices, team MARS Ratings, and game Interstate Status as the only inputs, performed on a holdout sample of games across seasons 2000-2011.

Here are the results:

The first block of this table is for models using loss functions derived from actual or absolute errors (ie raw differences between the predicted and actual game margins), the second block is for models using "bathtub" loss functions, and the third block is for models using the epsilon-insensitive loss functions.

In the upper section of each block I've shown the percentage of holdout games for which the relevant model's predictions were within X points of the actual margin. Here higher is better, and higher is greener. So, for example, the model built using the Absolute Error loss function produced predictions within 6 points of the actual margin in 14.3% of all holdout games; the green colouring of this cell indicates that this was a reasonably strong performance in the context of all the models shown here.

The lower section of each block provides the ranking of each model, across all 37 of the models, for the relevant metric. So, for example, the model built using the epsilon-insensitive loss function with M=5 ranks 3rd in terms of producing predictions within 3 points of the actual margin.

For me, here are some of the key results in this table:

The model built using the squared error loss function does generally well, but it achieves this result at the cost of only infrequently being very close to the actual result. In less than 2% of games in the holdout sample does it predict within a point of the actual margin, ranking it 35th on this measure, and in only 13.5% of games in the holdout sample does it predict within 6 points of the actual margin, ranking it 20th on this metric. Based on MAPE, however, it's the 2nd-best model of all.
Using error to the power of 4 - or squared-squared error if you like - as the loss function produces a model that's moderately good at producing predictions within about 4 goals of the actual result, but quite poor otherwise. Overall, its MAPE ranks 26th. This model is almost a mirror-image of the squared error based model.
Using absolute error as the loss function produces a quite competent prediction model, though one that's somewhat poor at predicting within 3 points of the actual margin. On this metric it ranks 24th. It's ranked in the top 20 on all other metrics however, including being ranked 10th on MAPE.
Models built using powers of the absolute error are generally even better performers than the model built using absolute error alone. Best in terms of MAPE is the model that uses the square root of the absolute error as the loss function; it ranks 5th on that metric. Next best in terms of the MAPE metric is the model that uses absolute error to the power of 3/2. It finishes 7th on that metric, but is 3rd on producing predictions that are within 3 points of the final margin.
Bathtub-shaped loss functions, as noted in the previous blog, produce margin predictors that tend to shine at producing predictions within a fixed threshold of the actual result, but whose performance tends to fall away rapidly above that threshold, so much so that the best of these models finishes only 13th on the MAPE metric.
Some of the models built using epsilon-insensitive loss functions are impressive all-around performers. In particular, the model which is insensitive to errors of less than 5 points does especially well. It ranks 3rd in terms of predictions within 3 points of the actual margin, 4th in terms of predictions within 18 points, 1st in terms of predictions within 36 points and within 50 points, and 1st overall in terms of MAPE. (As something of a curiosity, and not shown here, this model is relatively poor at predicting the winners of each game, correctly predicting less than 67% of all games in the holdout sample, which is more than 1% worse than the best-performed model, thus demonstrating yet again the difference between above-average margin prediction prowess and above-average result prediction prowess.)

Scanning the "Rank" section of each block suggests that a model's rank on MAPE is more determined by its rank on predicting within, say, 100 points of the final margin than by its rank on predicting within, say, 12 points. Correlating the ranks for each model for each of the rows in the "Rank" section with their ranks on MAPE, numerically supports this perception, as the table at left highlights.

This table shows that a model's ranking in terms of producing predictions within 6 goals of the actual final margin is an excellent indicator of its ranking on MAPE. Here the correlation is +0.79.

By way of contrast, a model's ranking on producing predictions within 3 goals of the actual final margin correlates only moderately (+0.33) with its ranking in terms of MAPE, and a model's ranking on producing predictions within half a goal of the actual final margin correlates very weakly (+0.11) with its ranking in terms of MAPE.

Lastly, here are the models that Eureqa produced for each of the loss functions considered here (click for a larger version).

Some of the epsilon-insensitive models are, arguably, more complex than some other of the models, but none seem to me to be excessively complex. In a future blog we might take a closer look at how the predictions of some of the stronger models change as the variables on which they depend are varied., but for now, enough.

UPDATE : The results for this blog were also affected by the glitch in the software that produced them, which allowed holdout data to bleed into the training set.

The updated results, which are no longer affected by this glitch, are as follows:

The results are, as you'd expect, not as strong as we saw originally. The best MAPE, for example, is now 29.76 points per game rather than 29.46 points per game, which rates it on a par with the TAB Sportsbet bookmaker rather than substantially superior to him. (In hindsight, maybe that should have been a clue that something was amiss.)

Interestingly, it's still the epsilon-insensitive model with M=5 that emerges as the best of all models considered - though the epsilon-insensitive models with M=3 and M=12, and the model with the error metric that is the square root of the absolute error, are only barely inferior.

We also find, as we did with the models based on bathtub-shaped loss functions, that the best-fitting models for epsilon-insensitive error functions are now far simpler and depend only on TAB Sportsbet bookmaker prices.

When you consider that the best of all these models is the epsilon-insensitive model with M=5, it's amazing that it's as simple as 37.47 + 3.184*Opponent_Price - 75.08/Opponent_Price. So, with just a single piece of information - the TAB Sportsbet bookmaker's starting price for the Away team - we can predict the correct margin with an MAPE of 29.76 points per game.

For completeness, here are the correlations between the ranks for the predictors in terms of MAPE with their ranks in terms of predicting within X points of the final result. Still we find that predictors' MAPE ranks are most correlated with their ability to predict within 6 goals (or more) of the actual result.

To be a good margin predictor, even in terms of MAPE it seems, it's far more important to never be far from the actual result than it is to often be very close to the actual result.