Matter of Stats

View Original

Victory Probabilities for Portions of Games

If the Home team is rated as a 75% chance of winning an upcoming game of AFL, what chance is it of winning the 1st quarter? The 2nd quarter? The 1st half? The 2nd half?

The challenge for today is to frame separate pre-game markets for these and a few other, similar wagers using only the TAB Bookmaker's pre-game prices. Before reading much further you might want to jot down your intuitions about how these markets will relate to the Home team's overall chances. Which quarter, for example, do you think a strong favourite would be most likely to win? The 1st because this is when it'll be keenest and least certain of winning? The 2nd because it'll take some time to assert its superiority? The 4th because by then it should have broken the spirit of its weaker competitor?

Remember that you'll need to assume you're making all your assessments prior to the game starting, without any knowledge of how earlier quarters have progressed when you're framing the market for the 2nd and subsequent quarters.

THE HOME TEAM IMPLICIT PROBABILITY

For today's blog I'm going to use the Bookmaker's Implicit Probability for the Home team as the way to express the relative chances of the two teams. While preparing this blog I did create, in a manner similar to that from an earlier blog, binary logits using various alternative other specifications of the relative chances such as log odds and, though Implicit Probability did not always perform best as measured by the models' AICs, it was always in the top 2 or 3.

Another reason to favour this measure is that it directly provides exquisitely well-calibrated probability forecasts, as evidenced by the following chart of empirical victory probability for Home teams versus Implicit Probability.

The blue line is the output from fitting a Generalized Additive Model to the 0/1 result data for Home teams as a function of the TAB Bookmaker's Implicit probability, which I've done here using data (excluding draws) from seasons 2006 to 2012 and with the assistance of R's ggplot2 package.

(For the R-curious, the syntax is as follows: 

ggplot(fd_use_FT, aes(x=Implicit_Prob, y=FT_Result)) +

scale_x_continuous("Implicit Home Team Probability") + scale_y_continuous("Probability of Home Team Leading at FT") +

stat_smooth(geom = "smooth", formula = y ~ x, se = TRUE, n = 100, size = 2) +

ggtitle("GAM Fitted to Density for Home Team Leading at Full Time vs Implicit Probability") +

theme(axis.title.x = element_text(face="bold", colour="#990000", size=20),

           axis.text.x  = element_text(size=16), axis.text.y  = element_text(size=16),

           axis.title.y = element_text(face="bold", colour="#990000", size=20),

           legend.position = "none", 

           plot.title = element_text(face="bold", size=24))

It's the stat_smooth component that produces the fitted GAM, which is ggplot2's go-to smoother when, as is the case here, you've a lot of data points [it's loess otherwise]. Because we've set geom to "smooth" as well, GAM incorporates penalised regression splines, with the n=100 setting determining how smooth the fitted line is likely to be. Smaller values for n tend to produce charts that are a series of line segments without smoothing constraints at the knots.)

The grey shadowed area in the chart defines a confidence interval around the fitted values, which gets fatter for more extreme implicit probabilities here as these are less frequently observed in the data, especially at the lower end of the scale.

Recall that a probabilistic forecaster's calibration is reflected in how closely actual outcomes reflect his or her probabilistic assessment of them: for a forecaster to be well-calibrated events that he or she rates as X% chances should occur about X% of the time. In the chart above such calibration would be indicated by a blue line that ran at a diagonal from bottom left to top right, passing through the points (0%, 0%), (50%, 50%) and (100%, 100%). That pretty much defines the line that we actually see.

THE FIRST QUARTER

What happens if we follow the same approach but fit the GAM instead to the results from the Home team's perspective for just the 1st quarter of the game (again ignoring draws)?

We get the chart on the left, the most obvious feature of which is the non-linear relationship it depicts between the Home team's Implicit Probability and its empirical success rate in 1st quarters.

A closer inspection reveals that, with the exception of a few values close to 50%, Home teams' empirical success rate in 1st quarters is always nearer to 50% than is their Implicit Probability of victory across the whole game.

For example, Home teams with an Implicit Probability of 25% win (according to the fitted model) about 37% of 1st quarters, while Home teams with an Implicit Probability of 75% win about 65% of 1st quarters.

The general rule then is that, if the Home team is the favourite, you need to shade their odds to set a fair price for winning the 1st quarter and, if the Home team is the underdog, you need to shorten their odds. The blue line gives you an idea of the extent to which you need to do this to make the wager fair.

THE SECOND QUARTER

Next we create the blue line for Home team's 2nd quarter performances.

Now, at least, the term "line" seems appropriate again, the fitted relationship between Implicit Probability and empirical results being far more linear in nature.

Once again we see that the Home team's empirical success rate is generally closer to 50% than its Implicit Probability.

Equal-favourite Home teams - that is, those with an Implicit Probability of 50% - win 2nd quarters at a rate slightly higher than 50% (which was also true for 1st quarters).

To the extent that it's possible for a line and a curve to be said to broadly pass through the same points, this is true of the blue line here and the one for 1st quarters shown above. Certainly it's the case if we look at the empirical success rates for Home teams with Implicit Probabilities of 25%, 50% and 75%.

THE THIRD QUARTER

It's sometimes said that the 3rd quarter is the "championship quarter", on which basis you might expect that teams entering games as favourites would win this quarter at a rate closer too - maybe even higher than - their Implicit Probability.

You'd be wrong.

The blue line for 3rd quarters looks more like the blue line for 1st quarters than it does the blue line for 2nd quarters but, remarkably, it too suggests similar success rates for Implicit Probabilities of 25% and 75%.

It does, however, behave in an interesting manner in its run up to an Implicit Probability of 50%, where it peaks at a success rate a little higher than for the 1st and 2nd quarters, then flat-spots in about the 50% to 70% Implicit Probability range where the empirical success rate hovers around 55%. About 30% of games start with the Home team's Implicit Probability in this range, and I'd surmise that a reasonable proportion of them are games where the underdog is close enough to the Home team favourite (or is even narrowly ahead of them) to encourage a little extra effort, thus depressing the Home team success rate relative to 1st and 2nd quarters.

Other of these games - especially those where the Home team's Implicit Probabilities is nearer 70% - I'd surmise are games where the Home team has already established a comfortable lead and so is starting to think about next week. 

Whatever it is, it's not until we get to probabilities around 70% that the linear relationship between success rate and Implicit Probabilities reasserts itself.

THE FOURTH QUARTER

Lastly, at least as far as quarters go, we turn to the 4th quarter.

Once more we're back in the land of the linear and yet again we find a blue line passing through similar success rates for Implicit Probabilities of 25%, 50% and 75%.

In general then, perhaps with the exception of more extreme probabilities, outside the 25% to 75% range, it seems that Home teams with a given pre-game Implicit Probability tend to win 1st, 2nd, 3rd and 4th quarters at about the same rate.

Prior to undertaking this analysis, that's not what I'd have assumed.

To demonstrate this, here's a chart that puts all four of the lines, one for each quarter of the contest, on the same set of axes.

As you can see, all of them intersect - or nearly do so - at Implicit Probabilities of 25% and 75%, and all but the line for the 3rd Quarter intersect at an Implicit Probability of 50%. (Nerd fact of the day, which I discovered while searching for a word to describe lines that pass through the same point: A pencil in projective geometry is a family of geometric objects with a common property, for example the set of lines that pass through a given point in a projective plane.)

The only notable divergences across the four lines are the higher success rates in 1st and 3rd quarters for Home teams with Implicit Probabilities above 75% (which comprise about 22% of games), the lower success rates in 3rd quarters for teams with Implicit Probabilities below 25% (which comprise about 10% of games), and the already-discussed flat-spot around an Implicit Probability of 50% for Home teams in 3rd quarters.

To some extent the smoothness in these lines - and the lack of it in some cases - might be attributed to the fitting and smoothing process, but I think the fitted GAMs do a good job of capturing the true, underlying relationships between Implicit Probabilities and quarter-by-quarter success rates without overfitting and tracking the noise in the raw data.

As well as looking at individual quarters we can, of course, look at larger pieces of each game, such as the two halves on their own, or the first three quarters of the game as a standalone unit.

THE FIRST HALF

If you think about it, the more a contest has progressed, the more the result should reflect the underlying strengths of the teams. To the extent then that the Bookmaker's pre-game prices accurately reflect these strengths, we'd expect the results for the 1st Half of a game to better match the Implicit Probabilitiies derived from these prices.

This is indeed what we observe in practice, though it's still the case that the Home team's success rate in the 1st Half tends to lie between 50% and its Implicit Probability. Now though its success rate is closer to its Implicit Probability than was the case for each of the Quarters.

Looking at the same reference points as earlier we find that Home teams with an Implicit Probability of 25% tend to win about 30% of 1st Halfs, that Home teams with an Implicit Probability of 50% win about 50%, and Home teams with 75% Implicit Probabilities win about 70% of 1st Halfs. Outside this range there's a slight bowing upwards.

Next we turn to a consideration of the game situation at the end of the 3rd Quarter.

THE FIRST THREE QUARTERS

If the logic I presented to explain what I expected to see for the 1st Half results holds up then the success rates of Home teams across the first three Quarters of the game should be even closer to their Implicit Probabilities.

Broadly, I think it's fair to say that this is true, though we do see slightly higher-than-expected success rates for Home teams with lower Implicit Probabilities.

What we also find is a fairly consistent slope across the entire range of Implicit Probabilities, with a 1% point increase in a Home team's Implicit Probability being associated, approximately, with a 1% point increase in its success rate.

Though we could look at other subdivisions of a game - for example the final three quarters, or the just 1st and 3rd quarters - the only other subdivision I want to explore in this blog is the 2nd Half.

THE SECOND HALF

A priori, I found it hard to come up with a logical reason to expect any particular pattern in the relationship between Implicit Probability and Home team success rates in the 2nd Half of games.

It turns out to be a bit more complicated than the near straight-line relationship for 1st Halfs. (I guess the plural should really be Halves, but that just doesn't seem right in the current context, so I'm going to stick with Halfs.)

Looking firstly at our three reference points we find that Home teams with an Implicit Probability of 25% win about 30% of 2nd Halfs, Home teams with an Implicit Probability of 50% win about 50% of 2nd Halfs, and teams with an Implicit Probability of 75% win a little over 70% of 2nd Halfs. These numbers are very similar to those for 1st Halfs. 

What's different about the line for 2nd Halfs though is the flattening in its slope for Implicit Probabilities in the 30% to 65% range, suggesting that Home team's success rate increases more slowly with higher pre-game Implicit Probabilities than it does for Implicit Probabilities outside this range. Part of this behaviour is inherited from the non-linear relationship between Implicit Probability and success rate that we saw in the chart for 3rd Quarters.

To put the lines for the 1st Three Quarters and for the two Halfs in context, I've put them all on the same chart, along with the line for the result of the game at Fulltime. This highlights their broad similarity - again especially at the three reference points of 25%, 50% and 75% Implicit Probability.

(Again a side note for the R-curious: the labels that I've been able to attach to the ends of each line have been put there for me automatically by a package called directlabels, which is available from r.forge and the CRAN repository and which is designed to work with "high-level plotting systems such as lattice and ggplot2". Without this package I'd have doubtless spent much time using the annotate option in ggplot2 attempting to achieve the same or a similar outcome.

All it took was the command direct.label(p, list("last.qp", cex=0.75)) where p held the current plot.

SUMMARY

The principles I take out of this are that: 

  • a team's true odds for winning any single quarter will be nearer to even money that their odds for winning the entire game
  • this is also true of a team's chances of leading at Three Quarter time and of winning the 1st Half or the 2nd Half but, in these cases, their chances will also be nearer to their Implicit Probabilities
  • for Implicit Probabilities in the range 25% to 75%, Home teams are generally no more likely to win any particular single quarter than any other (with the possible exception that Home teams with Implicit Probabilities near 50% win more 3rd quarters than 1st, 2nd or 4th quarters)
  • in this same Implicit Probability range they're also, broadly, no more likely to win one Half rather than the other or more likely to lead at Three Quarter time than they are to win a particular Half
  • the TAB Bookmaker is a disturbingly fine example of a well-calibrated forecaster