Predicting the Lead at Every Change

The idea for this blog came from Friend of MatterOfStats, Brett, who asked via e-mail whether I'd modelled the game margin at the end of quarters 1, 2 and 3, prompted by earlier posts such as this one, in which I modelled the final game margin alone.

It's a great idea and immediately raises questions about the likely trajectory of a team's victory or defeat. If the home team is expected to win, say, by 40 points, are they most likely to extend their lead in a linear fashion and be up by 10 points at Quarter Time, 20 points at Half Time and 30 points at Three-Quarter Time, or is a more parabolic or even a non-smooth arc to victory more probable?

For the purpose of modelling these outcomes I've decided to take the quantile regression approach (see this blog for details) for another spin but here to include as the lone regressor the bookmaker's pre-game implicit probability of a home team victory, calculated using the overround-equalising approach.

I'll fit four models in total, all to the data for every game in the seasons from 2006 to 2013, and each of the simple linear form:

Home Team Lead at End of Quarter X = a + b x Home Team's Implicit Probability

Each model will be estimated at 19 percentiles, starting at 5%, ending at 95%, with 5% increments.

FITTED MODELS

For anyone keen to perform their own calculations, here are the fitted coefficients of each model at each selected percentile.

So if, for example, you wanted to calculate the fitted 50th percentile or median lead at the end of Quarter 1 for a home team with a pre-game implicit probability of 75%, the calculation would be -12.90 + 0.75 x 27.07 = 7.4, meaning that there's an estimated 50% chance that such a home team would lead by this amount or less at the end of the 1st Quarter. The fitted median leads for this same home team at Half Time, Three-Quarter Time and Full Time would be 14.8 points, 22.5 points, and 29.5 points. For this team then, the trajectory to a roughly 30 point victory is quite linear.

For a different home team enjoying only equal favouritism with its opponent, the fitted median leads at the end of each term would instead be 0.6 points, 1.0 points, 1.9 points, and 1.9 points, implying a much less linear path to the final, small victory.

These two examples reveal that home teams with differing pre-game probabilities are expected to map out quite different scoring trajectories.

SCENARIOS

As a way of further understanding these models I fitted the cumulative distribution functions (CDFs) as at the end of all four quarters for five notional home teams with pre-game probabilities of 10%, 25%, 50%, 75% and 90%. These probabilities would apply to home teams with prices of about $9.50, $3.80, $1.90, $1.27 and $1.06 assuming a market with 5% total overround.

Here are the CDFs for the 1st Quarter:

The red line maps the CDF for a home team with a pre-game implicit probability of 10%. It crosses the 50th percentile at a home team deficit of about 10 points, so a home team sporting that pre-game probability would be expected to trail by this margin or more about half the time.

The equivalent margins for the other home team probabilities are:

  • About a 6 point deficit for a home team with a pre-game 25% probability
  • About a 1 point lead for a home team with a pre-game 50% probability
  • About a 7 point lead for a home team with a pre-game 75% probability
  • About an 11 point lead for a home team with a pre-game 90% probability

It's noteworthy, I think, that there's only about a 3-goal difference in these median leads.

By Half Time we find that the median margins have spread out.

These median margins now span a range that's wider than 7 goals, more than double the range at Quarter Time.

  • About a 21 point deficit for a home team with a pre-game 10% probability
  • About a 13 point deficit for a home team with a pre-game 25% probability
  • About a 1 point lead for a home team with a pre-game 50% probability
  • About a 15 point lead for a home team with a pre-game 75% probability
  • About a 23 point lead for a home team with a pre-game 90% probability

Note that the likelihood of a 3/1 home team underdog leading at Half Time is about 35% and that of a 9/1 home team underdog is less than 20%.

By Three-Quarter Time the spread has widened further still.

At Three-Quarter Time we now have median margins of:

  • About a 31 point deficit for a home team with a pre-game 10% probability
  • About a 19 point deficit for a home team with a pre-game 25% probability
  • About a 2 point lead for a home team with a pre-game 50% probability
  • About a 23 point lead for a home team with a pre-game 75% probability
  • About a 35 point lead for a home team with a pre-game 90% probability

So, the range is now 11 goals and the increment in this range between Half Time and Three-Quarter time (21 points) is only slightly smaller than the increment between Quarter Time and Half Time (23 points).

Lastly, using the fourth and final model, we find that the span of the median margins grows by about 23 points in the final terms, very slightly more than it grows in the second term.

By the end of the game we expect that 50% of the time:

  • A home team with a pre-game 10% probability will lose by 42 points or more
  • A home team with a pre-game 25% probability will lose by 26 points or more
  • A home team with a pre-game 50% probability will win by 2 points or less
  • A home team with a pre-game 75% probability will win by 30 points or less
  • A home team with a pre-game 90% probability will win by 46 points or less

FUTURE ANALYSES

The functional form that I've assumed for the regression model in this blog is linear in the home team pre-game probability and therefore assumes that every 1% increase in the home team's pre-game probability estimate has a fixed (in points terms) effect on the home team margin at any percentile. For example, the increment in the home team median margin at the end of the game is equal to 1.1 points for every 1% increase in home team probability (ie 1% of the 110.76 coefficient). This precludes, for example, these median margins from tracing out a Normal distribution for which the tails must taper at large home team wins and large home team losses.

In a future blog I'll explore the efficacy of allowing the home team probability to enter in different forms (say squares and cubes).