The Chase Australia : I'll Take Seat Four Thanks Larry
In the last blog we looked at how precisely we could forecast the High and Low Chaser offers using data from this Google document from James Spencer, which covers the entire history of Andrew O’Keefe’s reign as host from late 2015 to mid-2021.
There are other interesting questions that we can investigate using this data, and today we’ll analyse how a contestant’s estimated probability of getting home depends on how well he or she did in the Cash Builder, which offer he or she chose, and other features of the episode.
THE DATA
Again, we have data for 939 episodes, from which we’ll exclude:
any episode where there were only two contestants (9 episodes)
any episode that aired on a Sunday (3 episodes - these episodes look qualitatively different)
any episode where the high or low offers are not available in the data (1 episode)
That leaves us with data for 3,704 contestants from 926 episodes.
BUILDING A PREDICTIVE MODEL
For today’s blog I’m going to build a binary logit model where the target variable is 0 or 1 depending on whether or not the contestant got safely home. As potential explanatory variables, we’ll include:
The Chaser name
The Year in which the episode aired (treated as a categorical variable)
The Day of the Week on which the episode aired (treated as a categorical variable)
The Seat Number of the contestant in question (1 to 4, treated as a categorical variable)
The Cash Builder Amount that the contestant earned (in dollars)
The Number of Players who were already home before this contestant was up (0 to 3)
The Amount of Money already banked by earlier contestants (in dollars)
The Offer Chosen by the contestant (one of Low, Middle, or High)
We’ll again split the available data 70/30 into a training and a test set, and then fit models to the training data.
THE MODEL
I investigated a number of plausible functional forms, but the model producing the smallest AIC was the simplest:
GotHome ~ Chaser + SeatNumber + NumberContestantsHome + MoneyBanked + CashBuilder + Weekday + Year + OfferChosen
(The other models fitted were one with a CashBuilder and OfferChosen interaction term, and another that used the actual dollar amount offered rather than just whether it was the Low, Middle, or High Offer. Both had higher AIC values and so were rejected)
The fitted model is summarised at right, and we can see from comparing the summary statistics on the training and test data that there is no sign of overfitting. We can also see that, using a threshold probability of 50% to decide whether the model predicts a contestant will or will not get home, the model has high Sensitivity (ie predicts a large proportion of the contestants who do get home) but low Specificity (ie predicts a relatively small proportion of the contestants who do not get home). With this threshold, of the contestants it predicts will get home, about 70% do, and of those it predicts will not get home, about 60% do not.
This suggests that a contestant’s success is relatively easier to predict, given the data we have, than is his or her failure. Given that the only obvious measure we have of contestant ability is his or her Cash Builder amount (although Seat Number might also contain some information about relative ability), it might be that a weaker contestant’s multiple-choice ability is less well proxied by their Cash Builder total than is a stronger contestant’s.
We can interpret the model coefficients as follows:
The fitted probability for a contestant in Seat 1 who chose the Low Offer, registered $0 in his or her Cash Builder facing Anne Hegerty in an episode that aired on a Friday in 2015 is 1/(1+exp(-(-0.119)), which is 47%. By way of comparison, across all episodes in the sample, about 63% of contestants get home.
Contestants facing Cheryl, Matt, Shaun, Brydon or Mark are more likely to get through (all else being equal) than if they faced Anne, but slightly less likely if they faced Issa. Note, however, that the differences are not statistically significant at the 5% level. To give you some idea of the effect size, that same contestant described as having an estimated 47% chance above would have an estimated 58% chance against Cheryl, and a 46% chance against Issa. It’s interesting to speculate if the show’s producers try to match the Chaser to the average ability of the contestants in a particular episode, which would tend to reduce the estimated effect of differing Chaser abilities
The probability of getting home rises with Seat number. This could be because contestants are seated in order of ability, or the questions get easier for later contestants. More on this later. In the meantime, the estimated probability for our imagined contestant rises to 56% if he or she is in Seat 2, 67% if in Seat 3, and a whopping 76% if in Seat 4
The probability of getting home falls for every additional contestant who was already back home, and for every dollar that has already been banked, at the time the contestant in question stepped onto the stage. This is fairly strong evidence that question difficulty is altered based on the game situation.
Here are the estimated probabilities for some plausible scenarios for that same exemplar contestant:
Seat 2 / None Home : 56%
Seat 2 / One Home / $8,000 banked : 45%
Seat 2 / One Home / $16,000 banked : 40%
Seat 3 / None Home : 67%
Seat 3 / One Home / $8,000 banked : 57%
Seat 3 / One Home / $16,000 banked : 52%
Seat 3 / Two Home / $20,000 banked : 43%
Seat 4 / None Home : 76%
Seat 4 / One Home / $8,000 banked : 67%
Seat 4 / One Home / $16,000 banked : 63%
Seat 4 / Two Home / $20,000 banked : 54%
Seat 4 / Three Home / $30,000 banked : 41%As we would expect, better Cash Builder performances are associated with higher estimated probabilities of getting safely home. Returning to our original exemplar contestant in Seat 1 facing Anne whose base estimated probability was 47% with a $0 Cash Builder, we find that this probability rises to 57% with an $8,000 Cash Builder, and to 67% with a $16,000 Cash Builder
The airing day has a significant influence on the estimated probability of a contestant progressing, but I suspect that is more down to episode scheduling choices than it is to the difficulty of the questions. As I understand it, the episodes are not aired in the order in which they are shot, and the model shows that contestants fare best on episodes that air earlier in the week. Returning again to our original exemplar contestant, whose episode we assumed aired on a Friday, we have the following estimates had that episode aired on a different weekday
Monday : 85%
Tuesday : 74%
Wednesday : 65%
Thursday : 45%
Friday : 47% (as above)The year of airing has a much smaller effect on estimated probabilities, which suggests that the average question difficulty has not changed by much across time (or has changed in sync with average contestant ability). For our exemplar contestant, whose episode we assumed aired in 2015, estimated probabilities range from 41% had it aired in 2018, to 51% had it aired in 2017. Note that none of the differences in the coefficients is statistically significant at the 5% level
Lastly, the contestant’s choice of offer has a huge effect on his or her estimated probability of getting home. Our exemplar contestant, who we assumed took the Low offer sees his or her estimate fall to 24% if he or she takes the Middle Offer, and 7% if he or she takes the High Offer.
We can make a quantitative assessment of the relative importance of each of the variables in this model by creating a variable importance plot, which we do below.
(I do realise that the title and axis labels are quite small on these charts, but I’ve not yet found a way to convince the vip function to let me alter them from their defaults. Note, though, that you can access a larger version of this chart by clicking on it)
It confirms that, by far, the contestant’s offer choice is the most important variable in the model for predicting his or her likelihood of progressing. Of the five most important variables, two relate to Offer Choice and two more to Seat Number, with the fifth relating to the Weekday of Airing.
DOES IT MATTER WHERE YOU ARE SEATED?
Given the importance of Seat Number in the model, it’s interesting to look at some of the summary statistics by position. These appear in the table below.
What we find is that there is little variability across Seat Number in terms of the average Cash Builder amount achieved, which suggests that either ability is unrelated to position or that the average difficulty of Cash Builder questions is well-matched to the average ability of the people in each position.
We can get a hint about which of these hypotheses might be better supported by the data by looking at the conditional probability of getting home based on the offer chosen. These appear in the columns under the heading ‘Percentage who Get Home if …’, and what we find is that Seat 4 (and Seat 3) are more likely to get home if they take the Middle or High offer than are Seats 1 and 2.
We also find that Seat 4 has a relatively high probability of getting home when they take the Top offer, but that Seat 2 has an even greater probability when they do this. These estimates, however, are based on quite small samples - 63 in the case of those in Seat 4, and 53 in the case of those in Seat 2 - and so are subject to larger sampling variability (c +/- 6 to 7% points).
What we’re left with then are the following possibilities:
On average, contestants' abilities are unrelated to Seat Number, as reflected in the average Cash Builder amounts, and the multiple-choice questions are made slightly easier for the contestants in later positions
Stronger contestants tend to be seated in later positions, but Cash Builder questions are adjusted to compensate for this while the multiple-choice questions are not
Cash Builder and multiple-choice abilities are not highly correlated, and contestants who are stronger at multiple-choice questions tend to be seated in later positions, but they are no better at Cash Builder questions than an average contestant
Absent other information, I think it’s hard to choose between these hypotheses. Another one to ask the producers if ever we get the chance.
SUMMARY AND CONCLUSION
We can construct a fairly simple model to estimate the probability that a given contestant progresses, and that model suggests that those chances:
Depend a great deal on which of the offers he or she chooses
Also depend very significantly on the Seat in which he or she was located (possibly due to ability but perhaps also due to how question difficulty is dynamically altered during the course of an episode)
Depend significantly on the contestant’s Cash Builder amount
Depend somewhat on the game situation in terms of the number of contestants already home and how much they have banked
Vary across airing day, possibly due to scheduling choices
Vary far less across the show’s seasons
Depend a little on the Chaser that the contestant is facing