Explaining More of the Variability in the Victory Margin of Finals
This morning while out walking I got to wondering about two of the results from the latest post on the Wagers & Tips blog. First, we noted that teams from higher on the ladder have won 20 of the 22 Semi Finals between 2000 and 2010, and second we saw that the TAB bookmaker had installed the winning team as favourite in only 64% of these contests. So, in Semi Finals at least, the bookmaker's relatively often favoured the team that finished lower on the ladder in the home-and-away season, but these teams have rarely won.
The story's no better if we turn to MARS Ratings to find more discernment: the higher MARS-rated team has also won only 64% of these contests.
That led me to ponder how accurately a Consult The Ladder style tipster might have predicted finals results more generally across the period back to 2000. Rather startlingly, such a tipster would have correctly predicted just under 75% of winners, a rate that's superior to the TAB bookmaker by a few percentage points and superior to simply selecting the higher MARS-rated team by almost 10 percentage points.
Well if ladder position can help select which team is more likely to win, perhaps it can also assist in predicting the margin of victory. So, I thought, let's include that in a regression. I also wondered if the effects of differences in MARS Ratings, ladder positions and even relative bookmaker favouritism, might vary depending on whether the contest was an Elimination Final, Qualifying Final, Semi, Prelim or Grand Final, so I included some dummy variables for these too.
The resulting model, shown at right, was able to explain over 24% of the variability in the home team margin, which is over 5 percentage points more than the model described in the blog over on Wagers & Tips, which used MARS, bookmaker and venue data, explained. That's a significant improvement - in the statistical and in the everyday sense of the word.
(In this model, the Prob coefficient relates to the implicit TAB bookmaker probability for the home team as indicated by the prices in the head-to-head market.)
Some caution must be exercised in interpreting the individual coefficients in this model due to the high levels of multicollinearity amongst the variables. For example, Own MARS is correlated +0.5 with the Prob variable, -0.8 with the Own Ladder variable and -0.6 with the Opp Ladder variable.
As well, the Opp MARS variable is correlated -0.5 with Own Ladder and -0.7 with Opp Ladder; and Own Ladder and Opp Ladder are correlated +0.6 (which suggests that it is the nature of the finals system to pit teams from similar ladder positions against one another).
One of the manifestations of this multicollinearity is the fact that only one of these variables - Opp MARS - is statistically significant at even a 10% level, though each almost certainly contributes non-trivially to the explanation of home team margin.
Where such high levels of multicollinearity exist it becomes more challenging to assess the contribution of each variable in explaining the target variable, in this case the home team margin. One statistical approach that's been put forward to deal with this exact issue is called variable importance analysis and, in essence, it assesses the relative importance of each variable by measuring the change in efficacy of fitted models when that variable is included versus when it is excluded.
The two columns on the right of the table above provide these relative importance values for two variants of this approach: LMG and PMVD. Both variants suggest that variability in the TAB bookmaker's implicit home team probability explains the largest proportion of variability in the actual game margin. LMG estimates the bookmaker probability as contributing about one-third of the overall explanatory power of the model; PMVD estimates it at about one-half.
Our two MARS Rating variables together contribute about 30-35% of the explained variability in margin, the two Ladder position variables explain another 10-15%, and the dummy variables denoting the type of final collectively contribute the remaining approximately 15%.
As mentioned, all up the variables explain about 24% of the variability in home team margins. The diagram on the left shows you what 24% of explained variance looks like.
That leaves just one last obvious question: what does this model predict for this week's fare? It suggests the Hawks by 35 and the Eagles by 18.