Bookmakers, love them or lose to them, are good at their basic job, which is accurately estimating the probability of outcomes, and they give clues about their probability estimates in the prices they set. The problem is, those clues are cloaked in profit.

VIGORISH

Let’s imagine a contest with only two outcomes - home team win or away team win - and a bookmaker who assesses the probability of the former to be 60% (and, by implication, the probability of the latter to be 40%, because the sum of the probabilities across all outcomes must add to 1).

Were he to set what are called ‘fair prices’, then the price for the home team would be 1/0.6 = $1.67 and that for the away team 1/0.4 = $2.50. At those prices, assuming his probability estimates are correct, he or she stands to make, on average, no money, regardless of the amounts wagered on either team.

To see this, imagine, for example, that $100 is wagered on the home team and $300 on the away team, then the expected return to the bookmaker is: 60% x ($300 - $100 x 0.67) + 40% x ($100 - $300 x 1.5) = 0. More generally, if $H is wagered on the home team and $A on the away team, the expected return to the bookmaker is: 60% x ($H - $A x 0.67) + 40% x ($A - $H x 1.5) = 0.6 $H - 0.4 $A + 0.4 $A - 0.6 $H = 0

That is a decidely uncommercial endeavour, and the bookmaker addresses this by levying what’s called vigorish on each price, which serves to lower it from the “fair” value.

So, in the current case, the actual market might be home team at $1.60 and away team at $2.35. At those prices the bookmaker is guaranteed a profit of between 4% of turnover (if all money is wagered on the favourite) and 6% of turnover (if all money is wagered on the underdog). If equal amounts are wagered on both teams, the expected return is 5% of turnover.

We can calculate the total vigorish (or ‘vig’) in the two prices by first calculating the overround using:

Overround = 1/Home Price + 1/Away Price = 105.1%

The total vigorish then, which is the bookmaker’s expected profit percentage if wagers are made in proportion to the true probabilities (so, here, 60% on the favourite and 40% on the underdog) is given by:

Vigorish = (Overround - 1)/Overround = 4.8%.

For the most part the values for Overround - 1 and Vigorish are very similar, so they tend to be used interchangeably. Each give a rough idea of the bookmaker’s expected profit and larger values are worse for the wagerer and better for the bookmaker.

A VIGORISH REMOVAL SERVICE

Absent a model about how bookmakers differentially levy vigorish on the different outcomes, being able to calculate total vig won’t help us estimate his or her true underlying probability estimates.

The simplest model assumes that bookmakers levy vigorish equally on all outcomes, which here would mean that his or her underlying probability estimates could be found using:

Estimated Home Probability = Away Price / (Home Price + Away Price)
Estimated Away Probability = Home Price / (Home Price + Away Price)

To give a practical example of how this bookmaker model works, consider our earlier discussion where the fair prices were $1.67 and $2.50. We can add a fixed overround of 5% (and a fixed vig of 4.8%) to these prices by dividing each price by 1.05 to get $1.59 and $2.38 as our final prices. Note that $2.38 / ( $2.28 + $1.59) = 60%, as required.

This is a perfectly serviceable model but fails to accurately model, for example, bookmakers who demonstrate a favourite-longshot bias by imposing less vig on favourites and more on longshots or underdogs in the belief that punters have an inherent preference for longshots and can’t price them accurately.

There are a number of alternative bookmaker models that can cater for this and for other assumptions about markets and bookmaker motivations, and they have helpfully been coded in an R package named implied, which I’ll be using for the rest of this post.

THE DATA

For this analysis I’ll again be using the AFL data from the AusSportsBetting site, today focusing on the opening and closing home and away head-to-head prices of which there are 2,375 sets of pairs currently in the data.

My basic question is: how well do the different methods perform in removing vigorish from those prices?

Now we don’t know what the bookmakers’ true probability estimates were for any game, so we can’t use that as our performance metric, but we can proceed by assessing the quality of the probability estimates generated given that we know the actual outcomes of all 2,375 games (the implicit assumption being that bookmakers are extremely well-calibrated, so probability estimates that mirror reality well by definition will also mirror the bookmakers well).

There are here also a number of metrics that we could use to measure the quality of these probability estimates (for example, their Brier Score), but we’ll be using the Log Probability Score (LPS), which assigns a score to each probability estimate of 1+ log(p,2) where p is the estimated probability for the actual outcome, and the 2 means we’re working base two.

So, for example if we’d estimated a home team victory as a 60% chance then, if the home team won, our LPS would be 1+log(0.6,2) = 0.263. If, instead, the home team lost, our LPS would be 1+log(1-0.6,2) = -0.32.

The LPS metric is motivated by information theory and has a number of desirable characteristics including the fact that it rewards probability estimates close to 1 for outcomes that do occur, and probability estimates close to 0 for outcomes that don’t occur, and it does so in a monotonic way (so a 90% estimate scores better than an 89% estimate for an outcome that does occur). It also can’t be ‘gamed’ in the sense that a forecaster who believes that the true probability is p can’t do better than provide an estimate of p.

LPS is more punitive than the Brier Score for probability estimates near 0 and 1 where the outcome is opposite of what was more expected. Some people prefer this characteristic of the LPS, while others claim that it tends to encourage less bold forecasts.

(FWIW, the subsequent analyses would reach much the same conclusions if we, instead, used the Brier Score).

THE RESULTS

The table below records the mean LPS that each metric achieved by season on both the closing and opening prices.

Unsurprisingly, no bookmaker model is able to turn opening prices into the best probability estimates, and the Balanced Book model does worst of all with this data, although the Basic model (which is the one I outlined earlier) struggles in a few seasons as well when forced to use only opening prices.

Conversely, however, both of these metrics do quite well when presented with closing prices (with the exception of Balanced Book in 2021).

Forced to choose a single model, then, it’s hard to go past the Basic model on this data.

MAYBE A HYBRID?

Maybe it’s the case that, if there is a favourite-longshot bias in AFL head-to-head pricing, while the Basic method does well overall, it does less well for extreme prices because it can’t cater for the higher- or lower-than-usual levels of vigorish in these prices.

To test this, let’s restrict the analysis to games where the closing home team price was in the bottom decile ($1.14 or less) or top decile ($4.00 or more).

The Basic and Balanced Book models both do quite well here, again, when allowed to use closing prices, and show no evidence of being less capable of dealing with more extreme prices.

CALIBRATION

We can also check the quality of the Basic model probability estimates from closing prices by creating what is called a ‘calibration plot’. A well-calibrated model is one where outcomes with an estimated probability of X% actually occur close to X% of the time, and the calibration plot allows us to assess how true this is of a model by

Creating “bins” of estimated probabilities (eg bin 1 is estimates under 10%, bin 2 is estimates from 10% to under 15%, and so on). Usually, 10 such bins are created, each of approximately the same size
Calculating the average estimated probability for each bin
Calculating the average outcome probability for the games that are in each bin
Plotting the result

Doing this for the Basic method using closing prices produces the chart at right, which suggests that the probability estimates we get from this method are well-calibrated across all bins (ie across large, medium, and small probability estimates). This is evidenced by the fact that the points in the chart map quite closely to the 45-degree diagonal.

So that would seem to be that then - the Basic method, for all its simplicity, seems to do a totally adequate job of converting bookmaker closing prices into well-calibrated probability estimates.

(It would also suggest that there is no favourite-longshot bias in this data, or that it is so small that it is virtually undetectable given the sample size we have. That’s perhaps a topic for another blog).

SUMMARY AND CONCLUSION

We set out to investigate how we might transform bookmaker head-to-head prices into estimates of his or her true probability assessments by removing the vig incorporated in them and discovered that there are a number of ways of doing this.

Empirically, for the AFL bookmaker data collected by AusSportsBetting, we found that a perfectly adequate way of performing this transformation was to use what’s called the Basic method, which simply assesses the estimated underlying probability by calculating the proportion of the two prices represented by the price for the “other” outcome, and applying it to the closing prices.

That method not only produces well-calibrated probability estimates overall, but importantly also does it for all price points and probability ranges.

Statistical Analyses

Removing Vig from Bookmaker Prices

SUMMARY AND CONCLUSION

Matter of Stats

Contact Me

I can be contacted via Tony.Corke@gmail.com

MAKE A DONATION

If you enjoy the content here, please consider making a donation. Any amount is appreciated.

(For those not wanting to use PayPal, my email address is now also a PayID)

Subscribe to MoS VIA E-MAIL

SEARCH THE SITE