Team Scoring Model Parameter Sensitivity

May 28, 2015 Tony Corke

In recent blogs we've being exploring a range of topics related to team scoring, all of them based on a model I created in a series of blogs, culminating in this one from August 2014. This model assumes that:

Teams generate Scoring Shots consistent with a Bivariate Negative Binomial model with a fixed variance-covariance matrix and with a mean vector determined by the relative abilities of the teams and the venue on which the game is played
Those Scoring Shots are converted into Goals for each team consistent with a univariate BetaBinomial distribution with fixed Conversion rate and theta parameters, the latter taking on different values for Home teams and Away teams

Using this model requires us to set nine parameters:

Mean Scoring Shots (two parameters, one for the Home team and one for the Away team)
Size parameters for the bivariate Scoring Shot distribution (again, two parameters). These determine, along with the means, the underlying variability of Scoring Shot production for the teams.
The Correlation between Home team and Away team Scoring Shot production (a single parameter)
Mean Conversion rates (also two parameters)
Theta parameters for the Goal distributions (also two parameters). These determine the underlying variability of Goal scoring.

To date, I've used estimates for the Size, Theta and Correlation parameters based on fitting models to empirical game data from the period 2006 to 2013. I've then taken these parameters as fixed and explored the variability in game outcomes as a function of Scoring Shot production (for example, in this original piece from when the model was first derived and this more recent piece on mapping game margins to probabilities) and as a function of Scoring Shot Conversion (for example, this recent piece on the importance of goal-kicking accuracy).

I've not performed any of what those in the business world commonly refer to as "due diligence". Put simply, I've not estimated the effects of my assumptions about the Size, Theta and Correlation parameters on the outcomes I've been exploring.

Today I'll address that omission by investigating the sensitivity of a Home team's probability of victory to those parameters. Specifically, I'll consider five different Home teams that vary in the extent to which their Scoring Shot production is expected to exceed or fall short of their Away team opponents' Scoring Shot production, and for each of these five teams I'll estimate, via simulation, their chances of victory under different assumptions for their Size and Theta parameters, and under different assumptions for the correlation between their and their opponent's Scoring Shot production.

In all the simulations, the Conversion rates for both teams are held fixed at 53%, and the Away team's Size and Theta parameters are held fixed at their base levels. We do, however, vary the Away team's Mean Scoring Shot parameter.

The results of the simulations appear in the table below which is organised such that each block pertains to one of the five Home teams being considered. Each block is ordered internally in the same way, the first row providing the input parameters and simulation outputs (n = 1,000,000) for those parameters, the next two rows investigating variability in the Home team's Size parameter, the next two variability in the Theta parameter, and the last three investigating variability in the correlation between Home and Away team Scoring Shot production.

The table reveals that:

Even quite different values of the parameters being varied have only small effects on the simulated Home team victory probability. Doubling and halving the Size and Theta parameters, for example, produces changes in the Home team's victory chances of only about one-quarter to one-third of a percentage point.
The largest effects are related to changes in the Scoring Shot correlation parameter, though even these effects are smaller than 2% points.
Higher (ie more negative) levels of correlation in Scoring Shot production benefit underdog teams and are detrimental to favourites, the extent of the benefit or damage being positively related to the degree of favouritism or underdoggedness (we've had this chat before - favouritism needs an antonym).
Favourites also benefit, though much less so, from higher values of the Size parameter (which imply lower values for the variability of their Scoring Shot production) and from higher values of the Theta parameter (which imply less variability in their Goal production). For underdogs, the opposite is true. This result is analogous to what we found in this earlier blog on the benefits to weaker teams from greater variability.

CONCLUSION

Assessing this analysis in the context of the earlier work we've done varying other of the nine parameters in the team scoring model suggests that the parameters fall naturally into two groups

Those with moderate to significant impact on a team's victory chances: Mean Scoring Shot Production, Conversion Rate and Scoring Shot Correlation
Those with only minor impact: Size and Theta

In a practical sense this implies that a modeller seeking to predict the result of an upcoming game should spend more time attempting to model the teams' Expected Scoring Shot Production and Expected Conversion rate, as well as considering what features of a contest might lead to higher or lower correlation in Scoring Shot Production. Less time should be spent attempting to model the teams' Theta and Size parameters.

Put another way, the focus should be on modelling mean Scoring Shot production, expected Conversion rates, and likely Scoring Shot Correlation, and less on modelling the variability of Scoring Shot and Goal production.