Building Your Own Men's AFL Game Score Progression Simulator
/In this (long) blog I’ll walk you through the concepts and R code behind the creation of a fairly simple score progression simulator.
(There’s a link for you to download the entire code yourself at the end of the blog.)
All we’ll be interested in are “events” - period starts, period ends, goals and behinds - and the algorithm will determine for us, given the event that’s just occurred, what the next event is, and how far away in time it will take place.
To be able to do that, the first thing we’re going to need is some data about the typical time between events based on historical games, which we can obtain using the Swiss Army knife of footy data, the fitzRoy() R package.
GETTING THE SCORE PROGRESSION DATA
Here’s the code that will extract all of the available score progression data
library(fitzRoy)
library(dplyr)
# Create a worm receptacle
if (exists("AllWorms")) { rm(AllWorms) }
# Create a dataframe to track missing games
Missed = data.frame(Year = rep(0, 1000),
ID = rep("", 1000))
RowNum = 0
# NB Data is only available on fitzRoy for 2012 onwards
for (SeasonThis in 2012:2024)
{
# Get fixture ids for season
ff = fetch_fixture_afl(SeasonThis)
for (GameThis in ff$providerId)
{
# Catch worms with errors
WormThis = try(fetch_score_worm_data(GameThis), silent = TRUE)
# If the worm length is greater than 1, we have a score progression ...
if(length(WormThis) > 1)
{
# Add some useful details from the fixture data to the current worm
WormThis$Year = ff$compSeason.year[1]
WormThis$RoundNum = ff$round.roundNumber[ff$providerId == GameThis]
WormThis$RoundName = ff$round.name[ff$providerId == GameThis]
WormThis$Venue = ff$venue.name[ff$providerId == GameThis]
WormThis$HomeTeam = ff$home.team.name[ff$providerId == GameThis]
WormThis$AwayTeam = ff$away.team.name[ff$providerId == GameThis]
# Yes, this is a terrible way of progressively adding to a dataframe. Feel free to improve
if (!exists("AllWorms")) { AllWorms = WormThis } else { AllWorms = bind_rows(AllWorms, WormThis) }
}
# If there's no worm, record the missing worm details
if(length(WormThis) == 1)
{
RowNum = RowNum + 1
Missed$Year[RowNum] = SeasonThis
Missed$ID[RowNum] = GameThis
print(paste("Missed: ", RowNum, sep = ""))
}
}
print(paste("Finished Season: ", SeasonThis, sep = ""))
}
write.csv(AllWorms, 'AllWormData.csv', row.names = FALSE)
After you run this you should have an object named AllWorms, the first few rows and columns of which will look like the following:
Altogether, AllWorms should have score progressions for 2,626 games (with 38 missing). Check this by counting the number of unique matchids (ie AllWorms %>% count(matchid))
There will be 42 columns, some of the most important of which are, for our current purposes:
match_id: unique identifier for the game
periodNumber: from 1 to 4 (or, very occasionally, 5)
scoreType: “GOAL”, “BEHIND”, “RUSHED_BEHIND” or NA. Is NA for the first and last row of a game’s worm only
homeOrAway: Which team recorded the scoreType
cumulativeSeconds: When did the score occur, relative to the start of the match
GATHERING INTER-EVENT TIME DATA
Don’t let that fancy heading name scare you off, all we’ll be doing here is extracting all of the data for the time taken between each of the 10 following event pairs:
The start of a quarter and either side registering a goal
The start of a quarter and either side registering a quarter
One team registering a goal and then another goal
One team registering a goal and then a behind
One team registering a behind and then another behind
One team registering a behind and then a goal
One team registering a goal and then the other team registering a goal
One team registering a goal and then the other team registering a behind
One team registering a behind and then the other team registering a behind
One team registering a behind and then the other team registering a goal
The rationale for looking at each of these event pairs separately is because their distributions are likely to be quite different. Consider, for example the possible time between two goals being kicked by the same team, where the ball must be taken back to the centre in between, and the possible time between a behind and a goal being registered by the same team, whihc requires only that the kickout goes awry.
Here’s the code to extract that data
library(dplyr)
library(HDInterval) # we'll need these later
library(doParallel) # we'll need these later
# Read in the worms data
AllWorms = read.csv('AllWormData.csv')
# Ensure that it's in chronological order for every game
AllWorms = AllWorms %>% arrange(match_id, cumulativeSeconds)
# Create a series of lagged columns that will, among other things, bring together the current event with the previous one
AllWorms_Supp = AllWorms %>% filter(Year != 2020) %>%
group_by(match_id, periodNumber) %>%
mutate(LastEventTime = lag(periodSeconds,1)) %>%
mutate(LastEventType = if_else(is.na(LastEventTime), "Period Start", lag(scoreType,1)), LastEventTeam = if_else(is.na(LastEventTime), "Period Start", lag(teamName,1))) %>%
mutate(TimeBetweenEvents = if_else(is.na(LastEventTime), periodSeconds, periodSeconds - LastEventTime))
# Discard any row that's missing a Time Between Events or where that time difference is more than 1800 seconds
AllWorms_SuccessiveEventsData = AllWorms_Supp %>% filter(!is.na(TimeBetweenEvents) & TimeBetweenEvents <= 1800) %>%
mutate(LastEventWhich = if_else(is.na(LastEventTeam), "Period Start",
if_else(LastEventTeam == teamName, "Same Team", "Other Team")))
AllWorms_SuccessiveEventsData$scoreType = ifelse(is.na(AllWorms_SuccessiveEventsData$scoreType), "None",
ifelse(AllWorms_SuccessiveEventsData$scoreType == "GOAL", "Goal", "Behind"))
AllWorms_SuccessiveEventsData$LastEventType = ifelse(is.na(AllWorms_SuccessiveEventsData$LastEventType) |
AllWorms_SuccessiveEventsData$LastEventType == "Period Start", "Period Start",
ifelse(AllWorms_SuccessiveEventsData$LastEventType == "GOAL", "Goal", "Behind"))
AllWorms_SuccessiveEventsData$EventSequence = paste(AllWorms_SuccessiveEventsData$LastEventType, " then ", AllWorms_SuccessiveEventsData$scoreType, sep = "")
The AllWorms_SuccessiveEventsData dataframe is the AllWorms dataframe with six new columns that look like the following.
It’s in this format that we can gather complete time samples of each of the 10 event pairs we care about . For example, the second row gives us an example of the Period Start then Goal event pair and a time between them of 81 seconds. The third row gives an example of a goal being followed by a behind from the same team: here it took 80 seconds.
We’ll some back later and discuss exactly how we gather, summarise and use this data.
GATHERING QUARTER LENGTH DATA
We also need to model the end of quarters for which purpose we’ll fit a simple ordinary least squares model to historical quarter length data, using the period number, and the number of goals and quarters registered as explanatory variables or regressors.
QtrTimes = AllWorms %>% filter(Year != 2020) %>% filter(periodNumber < 5) %>% group_by(periodNumber, match_id) %>%
summarise(Examples = length(match_id),
Length = TotalPeriodSeconds[1],
NumberGoals = sum(scoreType == "GOAL", na.rm = TRUE),
NumberBehinds = sum(scoreType %in% c("GOAL", "BEHIND"), na.rm = TRUE))
model_rel_mean = lm(Length ~ periodNumber + NumberGoals + NumberBehinds, data = QtrTimes)
# Length ~ 1,593 + 4.6 x periodNumber + 33 x NumberGoals + 2.6 x NumberBehinds
sd_dev = sd(QtrTimes$Length - predict(model_rel_mean))
# 107.8851
We’ll use this information to set an initial length for every quarter by drawing randomly from a Normal distribution with Mean 1,593 + 4.5 x the period number seconds, and standard deviation of 108 seconds.
Then, during the simulation, we’ll add 33 seconds to the quarter length as each goal is scored, and 3 seconds as each behind is scored.
Let’s move on then to that simulation.
SIMULATING GAMES
Rather than providing you with code that simulates a single game, I’ll instead provide code that will let you do it for a number of games, and in parallel. In particular, I’m going to simulate all of the games in the 2024 season 100 times, using as the inputs for expected scoring shots, expected margin, and expected conversion those from my MoSHBODS model.
Because of its length, here we’ll review the code in chunks.
# Read in the MoSHBODS expected scoring, margin and conversion data
StatData = read.csv('MoSH_2024Season.csv')
# Set for reproducability. Change to get fresh random data
set.seed(57834)
# Play each of the 216 games 100 times
reps = 21600
# Start a timer
t_start = Sys.time()
# Create a cluster
registerDoParallel(cl <- makeCluster(20))
# Note that each pass will simulate a single game
results_list = foreach(rep_num = 1:reps, .packages = c("dplyr", "HDInterval")) %dopar% {
# Cycle through the 216 games in order
chosen_game = rep_num %% 216
if (chosen_game == 0) { chosen_game = 216 }
# Set Desired number of Scoring Shots
ExpSS = StatData$ExpSS[chosen_game]
# Set Desired Margin (from home team perspective)
ExpMargin = StatData$ExpMarg[chosen_game]
# Desired Conversion Rate
# We've assumed the same, fixed conversion rate for the home and away teams here
ExpConvRate = StatData$ExpConv[chosen_game]
# Calculate the implied home team share of scoring shots
# We will use this value as the fixed probability that the next scoring shot is by the home team
ExpMarginAsSSDiff = ExpMargin/(ExpConvRate*5+1)
ExpSS_Home = (ExpMarginAsSSDiff + ExpSS)/2
ExpSS_Away = ExpSS - ExpSS_Home
ExpHomeShare_SS = ExpSS_Home/ExpSS
# Generate initial base quarter lengths
ExpLengths = c(round(rnorm(1,model_rel_mean$coef[1] + model_rel_mean$coef[2]*1, sd_dev),0),
round(rnorm(1,model_rel_mean$coef[1] + model_rel_mean$coef[2]*2, sd_dev),0),
round(rnorm(1,model_rel_mean$coef[1] + model_rel_mean$coef[2]*3, sd_dev),0),
round(rnorm(1,model_rel_mean$coef[1] + model_rel_mean$coef[2]*4, sd_dev),0))
# Create a receptacle for the simulation data for a single game. Each row will be an event
Worm = data.frame(Period = rep(0,200),
EventNum = rep(0,200),
ElapsedTime = rep(0,200),
ScoreTeam = rep("",200),
ScoreType = rep("",200))
EventNum = 0
We now have everything set up ready to run the event-by-event part of the simulation.
for (QtrNum in 1:4)
{
ElapsedTime = 0
# Initialise the Quarter Length
QLength = ExpLengths[QtrNum]
# Create the first event row for the game, which is always the same
EventNum = EventNum + 1
Worm$Period[EventNum] = QtrNum
Worm$ElapsedTime[EventNum] = 0
Worm$ScoreTeam[EventNum] = "Period Start"
Worm$ScoreType[EventNum] = "Period Start"
QtrEnded = FALSE
while (!QtrEnded)
{
EventNum = EventNum + 1
# Get preliminary next scorer and score type
if (runif(1,0,1) < ExpHomeShare_SS)
{
Worm$ScoreTeam[EventNum] = "Home"
} else
{
Worm$ScoreTeam[EventNum] = "Away"
}
if (runif(1,0,1) < ExpConvRate)
{
Worm$ScoreType[EventNum] = "Goal"
} else
{
Worm$ScoreType[EventNum] = "Behind"
}
# Get elapsed time to score for whichever of the 10 event pairs is applicable
# These could all be in a single nested if-else, but that would look ugly.
# The option of a CASE statement would be lovely
if (Worm$ScoreTeam[EventNum-1] == "Period Start" & Worm$ScoreType[EventNum] == "Goal")
{
# We extract the relevant historical data given the pair of events we are considering
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Period Start then Goal", LastEventWhich == "Period Start")
# We choose a random point on the inverseCDF formed from the data
# Think of this as selecting a random value from the entire sample
# Creating a CDF just covers over any gaps
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum-1] == "Period Start" & Worm$ScoreType[EventNum] == "Behind")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Period Start then Behind", LastEventWhich == "Period Start")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum-1] != "Period Start")
{
if (Worm$ScoreTeam[EventNum] == Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Goal" & Worm$ScoreType[EventNum-1] == "Goal")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Goal then Goal", LastEventWhich == "Same Team")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum] == Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Goal" & Worm$ScoreType[EventNum-1] == "Behind")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Goal then Behind", LastEventWhich == "Same Team")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum] == Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Behind" & Worm$ScoreType[EventNum-1] == "Behind")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Behind then Behind", LastEventWhich == "Same Team")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum] == Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Behind" & Worm$ScoreType[EventNum-1] == "Goal")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Behind then Goal", LastEventWhich == "Same Team")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum] != Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Goal" & Worm$ScoreType[EventNum-1] == "Goal")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Goal then Goal", LastEventWhich == "Other Team")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum] != Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Goal" & Worm$ScoreType[EventNum-1] == "Behind")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Goal then Behind", LastEventWhich == "Other Team")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum] != Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Behind" & Worm$ScoreType[EventNum-1] == "Behind")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Behind then Behind", LastEventWhich == "Other Team")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
if (Worm$ScoreTeam[EventNum] != Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Behind" & Worm$ScoreType[EventNum-1] == "Goal")
{
RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Behind then Goal", LastEventWhich == "Other Team")
TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
}
}
At this point we have a candidate next event, but we’ve not checked to see if there’s time for it before the quarter ends.
In the first version of this script, I simply checked to see of the elapsed plus the between event time was greater than the current quarter length, and rejected the event if so. It struck me that this might disproportionately affect goal-scoring because the average goal to goal time for same or opposite teams, and the behind to goal time for opposite teams were longer than the goal to behind or behind to behind times. As such, a goal was more likely to be rejected than a behind, which would tend to drag the conversion rate down.
So, in the final version, I allow a candidate final goal if it would be scored withing 30 seconds of the current quarter length. Think of it as a goal after the siren.
In the simulations for the 2024 season, about 40% of simulated games had a goal after one or more of the 4 sirens.
So, let’s handle the end of quarters.
if (ElapsedTime + TimeToScore <= QLength)
{
ElapsedTime = ElapsedTime + TimeToScore
Worm$ElapsedTime[EventNum] = ElapsedTime
Worm$Period[EventNum] = QtrNum
Worm$EventNum[EventNum] = EventNum
# Extend the quarter length based on the event type
if(Worm$ScoreType[EventNum] == "Goal")
{
QLength = round(QLength + model_rel_mean$coef[3],0)
} else
{
QLength = round(QLength + model_rel_mean$coef[4],0)
}
} else
{
# Allow a goal at quarter end to extend the quarter by no more than 30 seconds
if ((Worm$ScoreType[EventNum] == "Behind") | (ElapsedTime + TimeToScore > QLength + 30))
{
Worm$Period[EventNum] = QtrNum
Worm$EventNum[EventNum] = EventNum
Worm$ScoreTeam[EventNum] = "Period End"
Worm$ScoreType[EventNum] = "Period End"
Worm$ElapsedTime[EventNum] = QLength
QtrEnded = TRUE
} else
{
ElapsedTime = ElapsedTime + TimeToScore
QLength = ElapsedTime
Worm$ElapsedTime[EventNum] = ElapsedTime
Worm$Period[EventNum] = QtrNum
Worm$EventNum[EventNum] = EventNum
# Create simultaneous end-of-period event
EventNum = EventNum + 1
Worm$Period[EventNum] = QtrNum
Worm$EventNum[EventNum] = EventNum
Worm$ScoreTeam[EventNum] = "Period End"
Worm$ScoreType[EventNum] = "Period End"
Worm$ElapsedTime[EventNum] = QLength
QtrEnded = TRUE
}
}
}
}
As a final step, we process the completed simulated games and add some useful cumulative data
# Keep only the used rows of Worm
Worm = Worm[1:EventNum,]
Worm$HomeGoals = 0
Worm$HomeBehinds = 0
Worm$AwayGoals = 0
Worm$AwayBehinds = 0
for (rn in 2:nrow(Worm))
{
if (!Worm$ScoreTeam[rn] %in% c("Period Start", "Period End"))
{
if (Worm$ScoreTeam[rn] == "Home")
{
if (Worm$ScoreType[rn] == "Goal")
{
Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1] + 1
Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1]
Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1]
Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1]
} else
{
Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1]
Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1] + 1
Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1]
Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1]
}
} else
{
if (Worm$ScoreType[rn] == "Goal")
{
Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1]
Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1]
Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1] + 1
Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1]
} else
{
Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1]
Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1]
Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1]
Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1] + 1
}
}
} else
{
Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1]
Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1]
Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1]
Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1]
}
}
Worm$HomeScore = 6*Worm$HomeGoals + Worm$HomeBehinds
Worm$AwayScore = 6*Worm$AwayGoals + Worm$AwayBehinds
Worm$HomeMargin = Worm$HomeScore - Worm$AwayScore
Worm$TotalElapsedTime = ifelse(Worm$Period == 1, Worm$ElapsedTime,
ifelse(Worm$Period == 2, Worm$ElapsedTime + max(Worm$ElapsedTime[Worm$Period == 1]),
ifelse(Worm$Period == 3, Worm$ElapsedTime + max(Worm$ElapsedTime[Worm$Period == 1]) + max(Worm$ElapsedTime[Worm$Period == 2]),
Worm$ElapsedTime + max(Worm$ElapsedTime[Worm$Period == 1]) + max(Worm$ElapsedTime[Worm$Period == 2]) + max(Worm$ElapsedTime[Worm$Period == 3]))))
Worm$RepNum = rep_num
Worm$ExpectedMargin = ExpMargin
Worm$ExpectedSS = ExpSS
Worm$ExpectedConv = ExpConvRate
Worm$HomeShareSS = ExpHomeShare_SS
return(Worm)
}
stopCluster(cl)
sim_results = do.call(rbind.data.frame, results_list)
print(Sys.time() - t_start)
CREATING METRICS FOR EACH SIMULATED GAME
The final thing we’ll do is create a range of metrics that will summarise each simulated game, for example, the number of goal streaks recorded by each team, and the longest for each.
We’ll do this using the following function:
GameStats = function(Worm)
{
Worm$HomeMarginLag = lag(Worm$HomeMargin,1)
Worm$HomeMarginLag2 = lag(Worm$HomeMargin,2)
HomeLeadTime = 0
AwayLeadTime = 0
DrawnTime = 0
UnderSixLeadTime = 0
MarginSecs = 0
AbsMarginSecs = 0
Worm$HomeScoreStreak = 0
Worm$HomeGoalStreak = 0
Worm$AwayScoreStreak = 0
Worm$AwayGoalStreak = 0
Worm$HomeScoreStreakStatus = "Off"
Worm$HomeGoalStreakStatus = "Off"
Worm$AwayScoreStreakStatus = "Off"
Worm$AwayGoalStreakStatus = "Off"
Worm = Worm %>% mutate(LastEventTime = lag(TotalElapsedTime, 1)) %>%
mutate(LastHomeMargin = if_else(is.na(LastEventTime), NA, lag(HomeMargin,1))) %>%
mutate(TimeBetweenEvents = if_else(is.na(LastEventTime), TotalElapsedTime, TotalElapsedTime - LastEventTime))
LastLeader = "None"
Worm$IsLeadChange = "No"
for (rn in 2:nrow(Worm))
{
if (Worm$LastHomeMargin[rn] == 0)
{
DrawnTime = DrawnTime + Worm$TimeBetweenEvents[rn]
} else
{
if (Worm$LastHomeMargin[rn] > 0)
{
HomeLeadTime = HomeLeadTime + Worm$TimeBetweenEvents[rn]
} else
{
AwayLeadTime = AwayLeadTime + Worm$TimeBetweenEvents[rn]
}
}
if ((Worm$HomeMargin[rn] > 0 & LastLeader == "Away") | (Worm$HomeMargin[rn] < 0 & LastLeader == "Home"))
{
Worm$IsLeadChange[rn] = "Yes"
}
if (Worm$HomeMargin[rn] > 0) { LastLeader = "Home" }
if (Worm$HomeMargin[rn] < 0) { LastLeader = "Away" }
if (!is.na(Worm$HomeMarginLag[rn]))
{
MarginSecs = MarginSecs + Worm$HomeMarginLag[rn] * Worm$TimeBetweenEvents[rn]
AbsMarginSecs = AbsMarginSecs + abs(Worm$HomeMarginLag[rn]) * Worm$TimeBetweenEvents[rn]
}
if (!is.na(Worm$HomeMarginLag[rn]) & abs(Worm$HomeMarginLag[rn]) < 6)
{
UnderSixLeadTime = UnderSixLeadTime + Worm$TimeBetweenEvents[rn]
}
### Calculate streak data
if (Worm$ScoreType[rn] %in% c("Period End", "Period Start"))
{
Worm$HomeScoreStreak[rn] = Worm$HomeScoreStreak[rn-1]
Worm$HomeGoalStreak[rn] = Worm$HomeGoalStreak[rn-1]
Worm$AwayScoreStreak[rn] = Worm$AwayScoreStreak[rn-1]
Worm$AwayGoalStreak[rn] = Worm$AwayGoalStreak[rn-1]
Worm$HomeScoreStreakStatus[rn] = Worm$HomeScoreStreakStatus[rn-1]
Worm$HomeGoalStreakStatus[rn] = Worm$HomeGoalStreakStatus[rn-1]
Worm$AwayScoreStreakStatus[rn] = Worm$AwayScoreStreakStatus[rn-1]
Worm$AwayGoalStreakStatus[rn] = Worm$AwayGoalStreakStatus[rn-1]
}
if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Home" & Worm$HomeGoalStreak[rn-1] == 0 & Worm$HomeScoreStreak[rn-1] == 0)
{
Worm$HomeScoreStreak[rn] = 6
Worm$HomeGoalStreak[rn] = 1
Worm$HomeScoreStreakStatus[rn] = "On"
Worm$HomeGoalStreakStatus[rn] = "On"
}
if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Home" & Worm$HomeGoalStreak[rn-1] == 0 & Worm$HomeScoreStreak[rn-1] > 0)
{
Worm$HomeScoreStreak[rn] = Worm$HomeScoreStreak[rn-1] + 6
Worm$HomeGoalStreak[rn] = 1
Worm$HomeScoreStreakStatus[rn] = "Continuing"
Worm$HomeGoalStreakStatus[rn] = "On"
}
if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Home" & Worm$HomeGoalStreak[rn-1] > 0)
{
Worm$HomeScoreStreak[rn] = Worm$HomeScoreStreak[rn-1] + 6
Worm$HomeGoalStreak[rn] = Worm$HomeGoalStreak[rn-1] + 1
Worm$HomeScoreStreakStatus[rn] = "Continuing"
Worm$HomeGoalStreakStatus[rn] = "Continuing"
}
# Ensure that a behind doesn't end a goal streak for either team
if (Worm$ScoreType[rn] == "Behind" & Worm$ScoreTeam[rn] == "Home")
{
Worm$HomeScoreStreak[rn] = Worm$HomeScoreStreak[rn-1] + 1
Worm$HomeGoalStreak[rn] = Worm$HomeGoalStreak[rn-1]
Worm$AwayGoalStreak[rn] = Worm$AwayGoalStreak[rn-1]
Worm$HomeScoreStreakStatus[rn] = "Continuing"
Worm$HomeGoalStreakStatus[rn] = Worm$HomeGoalStreakStatus[rn-1]
Worm$AwayGoalStreakStatus[rn] = Worm$AwayGoalStreakStatus[rn-1]
}
##
if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Away" & Worm$AwayGoalStreak[rn-1] == 0 & Worm$AwayScoreStreak[rn-1] == 0)
{
Worm$AwayScoreStreak[rn] = 6
Worm$AwayGoalStreak[rn] = 1
Worm$AwayScoreStreakStatus[rn] = "On"
Worm$AwayGoalStreakStatus[rn] = "On"
}
if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Away" & Worm$AwayGoalStreak[rn-1] == 0 & Worm$AwayScoreStreak[rn-1] > 0)
{
Worm$AwayScoreStreak[rn] = Worm$AwayScoreStreak[rn-1] + 6
Worm$AwayGoalStreak[rn] = 1
Worm$AwayScoreStreakStatus[rn] = "Continuing"
Worm$AwayGoalStreakStatus[rn] = "On"
}
if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Away" & Worm$AwayGoalStreak[rn-1] > 0)
{
Worm$AwayScoreStreak[rn] = Worm$AwayScoreStreak[rn-1] + 6
Worm$AwayGoalStreak[rn] = Worm$AwayGoalStreak[rn-1] + 1
Worm$AwayScoreStreakStatus[rn] = "Continuing"
Worm$AwayGoalStreakStatus[rn] = "Continuing"
}
# Ensure that a behind doesn't end a goal streak for either team
if (Worm$ScoreType[rn] == "Behind" & Worm$ScoreTeam[rn] == "Away")
{
Worm$AwayScoreStreak[rn] = Worm$AwayScoreStreak[rn-1] + 1
Worm$AwayGoalStreak[rn] = Worm$AwayGoalStreak[rn-1]
Worm$HomeGoalStreak[rn] = Worm$HomeGoalStreak[rn-1]
Worm$AwayScoreStreakStatus[rn] = "Continuing"
Worm$AwayGoalStreakStatus[rn] = Worm$AwayGoalStreakStatus[rn-1]
Worm$HomeGoalStreakStatus[rn] = Worm$HomeGoalStreakStatus[rn-1]
}
}
for (rn in 1:(nrow(Worm)-1))
{
if (Worm$HomeScoreStreakStatus[rn] %in% c("On", "Continuing") & Worm$HomeScoreStreakStatus[rn+1] == "Off") { Worm$HomeScoreStreakStatus[rn] = "Ended" }
if (Worm$HomeGoalStreakStatus[rn] %in% c("On", "Continuing") & Worm$HomeGoalStreakStatus[rn+1] == "Off") { Worm$HomeGoalStreakStatus[rn] = "Ended" }
if (Worm$AwayScoreStreakStatus[rn] %in% c("On", "Continuing") & Worm$AwayScoreStreakStatus[rn+1] == "Off") { Worm$AwayScoreStreakStatus[rn] = "Ended" }
if (Worm$AwayGoalStreakStatus[rn] %in% c("On", "Continuing") & Worm$AwayGoalStreakStatus[rn+1] == "Off") { Worm$AwayGoalStreakStatus[rn] = "Ended" }
if (Worm$HomeScoreStreakStatus[nrow(Worm)] %in% c("On", "Continuing")) { Worm$HomeScoreStreakStatus[nrow(Worm)] = "Ended" }
if (Worm$HomeGoalStreakStatus[nrow(Worm)] %in% c("On", "Continuing")) { Worm$HomeGoalStreakStatus[nrow(Worm)] = "Ended" }
if (Worm$AwayScoreStreakStatus[nrow(Worm)] %in% c("On", "Continuing")) { Worm$AwayScoreStreakStatus[nrow(Worm)] = "Ended" }
if (Worm$AwayGoalStreakStatus[nrow(Worm)] %in% c("On", "Continuing")) { Worm$AwayGoalStreakStatus[nrow(Worm)] = "Ended" }
}
Stats = data.frame(GameLength = max(Worm$TotalElapsedTime),
HomeGoals = Worm$HomeGoals[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
HomeBehinds = Worm$HomeBehinds[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
AwayGoals = Worm$AwayGoals[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
AwayBehinds = Worm$AwayBehinds[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
HomeScore = Worm$HomeScore[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
AwayScore = Worm$AwayScore[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
HomeMargin = Worm$HomeScore[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)] - Worm$AwayScore[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
HomeLeadTime = HomeLeadTime,
AwayLeadTime = AwayLeadTime,
DrawnTime = DrawnTime,
LeadChangesTotal = sum(Worm$IsLeadChange == "Yes"),
LeadChangesQ1 = sum(Worm$IsLeadChange == "Yes" & Worm$Period == 1),
LeadChangesQ2 = sum(Worm$IsLeadChange == "Yes" & Worm$Period == 2),
LeadChangesQ3 = sum(Worm$IsLeadChange == "Yes" & Worm$Period == 3),
LeadChangesQ4 = sum(Worm$IsLeadChange == "Yes" & Worm$Period == 4),
AveHomeLead = MarginSecs/max(Worm$TotalElapsedTime),
AveLead = AbsMarginSecs/max(Worm$TotalElapsedTime),
MaxHomeLead = max(Worm$HomeMargin[Worm$ScoreType != "Period Start"], na.rm = TRUE),
LatestTimeOfMaxHomeLead = max(Worm$TotalElapsedTime[Worm$HomeMargin == max(Worm$HomeMargin, na.rm = TRUE)], na.rm = TRUE),
MaxAwayLead = -min(Worm$HomeMargin[Worm$ScoreType != "Period Start"], na.rm = TRUE),
LatestTimeOfMaxAwayLead = max(Worm$TotalElapsedTime[Worm$HomeMargin == min(Worm$HomeMargin, na.rm = TRUE)], na.rm = TRUE),
TimeLastLeadChange = max(Worm$TotalElapsedTime[Worm$IsLeadChange == "Yes"], na.rm = TRUE),
TimeLeadUnderAGoal = UnderSixLeadTime)
Stats$SSperMinute = (Stats$HomeGoals + Stats$HomeBehinds + Stats$AwayGoals + Stats$AwayBehinds)/Stats$GameLength * 60
Stats$PointsperMinute = (6*Stats$HomeGoals + Stats$HomeBehinds + 6*Stats$AwayGoals + Stats$AwayBehinds)/Stats$GameLength * 60
Stats$PointsperSS = Stats$PointsperMinute/Stats$SSperMinute
Stats$NumberHomeGoalStreaks = sum(Worm$HomeGoalStreakStatus == "Ended", na.rm = TRUE)
Stats$AveLengthHomeGoalStreaks = sum(Worm$HomeGoalStreak[Worm$HomeGoalStreakStatus == "Ended"], na.rm = TRUE)/Stats$NumberHomeGoalStreaks
Stats$MaxLengthHomeGoalStreaks = max(Worm$HomeGoalStreak[Worm$HomeGoalStreakStatus == "Ended"], na.rm = TRUE)
Stats$NumberHomeScoreStreaks = sum(Worm$HomeScoreStreakStatus == "Ended", na.rm = TRUE)
Stats$AvePtsHomeScoreStreaks = sum(Worm$HomeScoreStreak[Worm$HomeScoreStreakStatus == "Ended"], na.rm = TRUE)/Stats$NumberHomeScoreStreaks
Stats$MaxPtsHomeScoreStreaks = max(Worm$HomeScoreStreak[Worm$HomeScoreStreakStatus == "Ended"], na.rm = TRUE)
Stats$NumberAwayGoalStreaks = sum(Worm$AwayGoalStreakStatus == "Ended", na.rm = TRUE)
Stats$AveLengthAwayGoalStreaks = sum(Worm$AwayGoalStreak[Worm$AwayGoalStreakStatus == "Ended"], na.rm = TRUE)/Stats$NumberAwayGoalStreaks
Stats$MaxLengthAwayGoalStreaks = max(Worm$AwayGoalStreak[Worm$AwayGoalStreakStatus == "Ended"], na.rm = TRUE)
Stats$NumberAwayScoreStreaks = sum(Worm$AwayScoreStreakStatus == "Ended", na.rm = TRUE)
Stats$AvePtsAwayScoreStreaks = sum(Worm$AwayScoreStreak[Worm$AwayScoreStreakStatus == "Ended"], na.rm = TRUE)/Stats$NumberAwayScoreStreaks
Stats$MaxPtsAwayScoreStreaks = max(Worm$AwayScoreStreak[Worm$AwayScoreStreakStatus == "Ended"], na.rm = TRUE)
return(Stats)
}
I won’t discuss anything from this code since this blog is long enough already and, hopefully, much of the code is self-explanatory.
To finish, then, here’s how we call this code in parallel.
registerDoParallel(cl <- makeCluster(20))
sim_results_list = foreach(rep_num = 1:reps, .packages = c("dplyr", "HDInterval")) %dopar% {
GS = GameStats(sim_results %>% filter(RepNum == rep_num))
GS$RepNum = rep_num
return(GS)
}
stopCluster(cl)
sim_game_stats = do.call(rbind.data.frame, sim_results_list)
The table at right compares the results of my simulations of the 2024 season with those for the actual games of 2024.
In general, the differences are relatively minor, although it does appear that the simulated games are a little less streaky.
The differences in the conversion rates can be entirely attributed to the MoSHBODS inputs, which set expected conversion rates at about 52.6%.
Below is a link where you can download the code and the MoSHBODS 2024 data. Please let me know if you’ve any questions or suggestions.