Careers as a Reflection of Underlying Ability
Life's fortunes, I suspect, are a lot more randomly determined than we allow ourselves to believe, and sporting careers, being subject to those same forces, are no different.
Recently I was thinking about the best way of modelling a batsman's final score in a completed Test innings when I came across a piece by Brendon Brewer from the University of New South Wales on just this topic. He surmises that, for most batsmen, each Test innings plays out over two phases: an initial phase, before the batsman is said to be "set" and during which he is more susceptible to dismissal, and a second and final phase during which his probability of dismissal slowly declines as each run is scored and asymptotes towards some fixed value.
To model this situation he derives a Hazard function, which quantifies the probability that a batsmen is dismissed on a particular score conditional on the fact that he has attained that score. (Note that we're excluding not out scores in all our analyses or, put another way, we're considering only completed innings.)
The function has four parameters:
- Mu 1: which we can think of as a measure of the batsman's ability in the 1st phase of his innings
- Mu 2: which we can think of as a measure of the batsman's ability in the 2nd phase of his innings
- Tau: which is the mid-point of the transition between the two phases
- L: which controls the abruptness of the transition between the two phases
Brewer goes on to empirically estimate the parameters of this model for a number of Test batsmen. For my purposes here I'm going to use the values he calculated for Brian Lara, which I've replicated at right and then use to create a chart showing what they imply for the probability of dismissal at any given score.
The two phase nature of what we're simulating is made clear by the chart, and the L value of 2.8 makes the transition from Phase 1 to Phase 2 somewhat abrupt (an L equal to 0 would make the transition a step-function, which is as abrupt as it can be).
A batsman with the parameters shown here would be expected to average 49.4 runs per completed innings, a little below Brian Lara's actual average of 52.9, though that average included uncompleted (ie not out) innings.
Simulating Careers
We're now going to take those parameters and the Score distribution they imply, and use them to model 20 careers, each of 150 completed innings in length, and assuming that there is no innings-to-innings correlation in a batsman's scoring. In other words we'll exclude the possibility of form slumps and peaks, which might cause the Score distribution to vary from one innings to the next. (The existence of "form" is a testable claim, and one I might return to in a later blog.)
Here are the results:
Recognise that these are the careers for 20 batsmen of identical ability, and that any one of those careers is just as likely to have transpired as any other.
Batsman #2 becomes a legend of the game, with an average over 60 as well as 6 scores over 200 and 33 centuries. During his entire career, the longest he goes without a score of 50 or more is just 6 innings. He converts into centuries almost 50% of the time he reaches 50.
Contrast this with Batsman #12 who, truly, never realises his actual potential, finishing his lengthy career (if it were allowed go that long) with an average of just over 38, no double hundreds, just 12 centuries, and 14 ducks. Much of his (imaginary) career was no doubt spent defending his inability to convert 50s into 100s since he achieved this only 27% of the time. He also endured a period in his career when he went 11 innings without a score over 50.
Batsman #19, however, had a more dramatic "form slump", going 15 innings without a score over 50. Still, his 21 centuries and 5 double centuries ensured his somewhat elevated place in cricket history with a career average of 46.3 runs per completed innings.
As you cast your eyes across the other careers you'll spot other archetypes - Batsman #13 and Batsmen #18 who are known for converting 50s into 100s, often big ones; Batsman #14, who rarely goes cheaply but only once has converted a century into a double; and Batsman #17, who's good for 50 but only converts about 40% of the time - though he has reeled off those 5 superlative double tons.
Now there's nothing contrived about the results I've presented here. They are the genuine results of a single simulation of 20 careers. The combined average of the 20 careers is 49.3 runs per completed innings, which is entirely consistent with the long-run expectation of 49.4 runs per completed innings.
What they show is the extent of the natural variability in scoring amongst players of equal proficiency, even across a relatively long (in cricketing terms) career. Shorter careers would exhibit even greater variability.
It would be interesting, in fact, to take this same model and overlay some simple selectorial rules that might serve to truncate careers. For example, we might assume that a player making X scores below Y in Z innings would be dropped for a fixed number of games or dropped permanently, thereby shortening the number of innings over which his entire career plays out. Other stochastic elements could also be added, such as the number of games missed due to a "form slump".
SUMMARY AND CONCLUSION
None of this analysis diminishes the legacy of the legends of the game. Players with higher career batting averages are statistically more likely to have greater batting prowess than players with lower averages. But the variability we've witnessed here suggests that there will be exceptions, in both directions - players whose averages substantially understate and players whose averages substantially overstate their true underlying ability.
I think sometimes we'd do well to recognise that the scores we're witnessing might not - or might not only - indicate a form slump or a "golden period", but instead a momentary run of landing on the right or wrong side of random fluctuation.