Building Your Own Men's AFL Game Score Progression Simulator

In this (long) blog I’ll walk you through the concepts and R code behind the creation of a fairly simple score progression simulator.

(There’s a link for you to download the entire code yourself at the end of the blog.)

All we’ll be interested in are “events” - period starts, period ends, goals and behinds - and the algorithm will determine for us, given the event that’s just occurred, what the next event is, and how far away in time it will take place.

To be able to do that, the first thing we’re going to need is some data about the typical time between events based on historical games, which we can obtain using the Swiss Army knife of footy data, the fitzRoy() R package.

GETTING THE SCORE PROGRESSION DATA

Here’s the code that will extract all of the available score progression data

library(fitzRoy)
library(dplyr)
# Create a worm receptacle
if (exists("AllWorms")) { rm(AllWorms) }
# Create a dataframe to track missing games
Missed = data.frame(Year = rep(0, 1000),
                    ID = rep("", 1000))
RowNum = 0                     
# NB Data is only available on fitzRoy for 2012 onwards
for (SeasonThis in 2012:2024)
{
    # Get fixture ids for season
    ff = fetch_fixture_afl(SeasonThis)
    for (GameThis in ff$providerId)
    {
       # Catch worms with errors
       WormThis = try(fetch_score_worm_data(GameThis), silent = TRUE)      
       # If the worm length is greater than 1, we have a score progression ...
       if(length(WormThis) > 1)
       {
           # Add some useful details from the fixture data to the current worm 
           WormThis$Year = ff$compSeason.year[1]
           WormThis$RoundNum = ff$round.roundNumber[ff$providerId == GameThis]
           WormThis$RoundName = ff$round.name[ff$providerId == GameThis]
           WormThis$Venue = ff$venue.name[ff$providerId == GameThis]
           WormThis$HomeTeam = ff$home.team.name[ff$providerId == GameThis]
           WormThis$AwayTeam = ff$away.team.name[ff$providerId == GameThis]
           # Yes, this is a terrible way of progressively adding to a dataframe. Feel free to improve
           if (!exists("AllWorms")) { AllWorms = WormThis } else { AllWorms = bind_rows(AllWorms, WormThis) }
       }
       # If there's no worm, record the missing worm details
       if(length(WormThis) == 1) 
       {
          RowNum = RowNum + 1
          Missed$Year[RowNum] = SeasonThis
          Missed$ID[RowNum] = GameThis          
          print(paste("Missed: ", RowNum, sep = ""))
        }
    }    
    print(paste("Finished Season: ", SeasonThis, sep = ""))
}                                                            
write.csv(AllWorms, 'AllWormData.csv', row.names = FALSE)

After you run this you should have an object named AllWorms, the first few rows and columns of which will look like the following:

Altogether, AllWorms should have score progressions for 2,626 games (with 38 missing). Check this by counting the number of unique matchids (ie AllWorms %>% count(matchid))

There will be 42 columns, some of the most important of which are, for our current purposes:

match_id: unique identifier for the game

periodNumber: from 1 to 4 (or, very occasionally, 5)

scoreType: “GOAL”, “BEHIND”, “RUSHED_BEHIND” or NA. Is NA for the first and last row of a game’s worm only

homeOrAway: Which team recorded the scoreType

cumulativeSeconds: When did the score occur, relative to the start of the match

GATHERING INTER-EVENT TIME DATA

Don’t let that fancy heading name scare you off, all we’ll be doing here is extracting all of the data for the time taken between each of the 10 following event pairs:

  • The start of a quarter and either side registering a goal

  • The start of a quarter and either side registering a quarter

  • One team registering a goal and then another goal

  • One team registering a goal and then a behind

  • One team registering a behind and then another behind

  • One team registering a behind and then a goal

  • One team registering a goal and then the other team registering a goal

  • One team registering a goal and then the other team registering a behind

  • One team registering a behind and then the other team registering a behind

  • One team registering a behind and then the other team registering a goal

The rationale for looking at each of these event pairs separately is because their distributions are likely to be quite different. Consider, for example the possible time between two goals being kicked by the same team, where the ball must be taken back to the centre in between, and the possible time between a behind and a goal being registered by the same team, whihc requires only that the kickout goes awry.

Here’s the code to extract that data

library(dplyr)
library(HDInterval) # we'll need these later
library(doParallel) # we'll need these later

# Read in the worms data
AllWorms = read.csv('AllWormData.csv')
# Ensure that it's in chronological order for every game
AllWorms = AllWorms %>% arrange(match_id, cumulativeSeconds)
# Create a series of lagged columns that will, among other things, bring together the current event with the previous one
AllWorms_Supp = AllWorms %>% filter(Year != 2020) %>% 
                             group_by(match_id, periodNumber) %>% 
                             mutate(LastEventTime = lag(periodSeconds,1)) %>%
                             mutate(LastEventType = if_else(is.na(LastEventTime), "Period Start", lag(scoreType,1)),                  LastEventTeam = if_else(is.na(LastEventTime), "Period Start", lag(teamName,1))) %>%
                             mutate(TimeBetweenEvents = if_else(is.na(LastEventTime), periodSeconds, periodSeconds - LastEventTime))          
# Discard any row that's missing a Time Between Events or where that time difference is more than 1800 seconds
AllWorms_SuccessiveEventsData = AllWorms_Supp %>% filter(!is.na(TimeBetweenEvents) & TimeBetweenEvents <= 1800) %>% 
            mutate(LastEventWhich = if_else(is.na(LastEventTeam), "Period Start",
                                    if_else(LastEventTeam == teamName, "Same Team", "Other Team")))
AllWorms_SuccessiveEventsData$scoreType = ifelse(is.na(AllWorms_SuccessiveEventsData$scoreType), "None",
                                          ifelse(AllWorms_SuccessiveEventsData$scoreType == "GOAL", "Goal", "Behind"))
AllWorms_SuccessiveEventsData$LastEventType = ifelse(is.na(AllWorms_SuccessiveEventsData$LastEventType) |    
                          AllWorms_SuccessiveEventsData$LastEventType == "Period Start", "Period Start",                                               
                                             ifelse(AllWorms_SuccessiveEventsData$LastEventType == "GOAL", "Goal", "Behind"))                                                 
AllWorms_SuccessiveEventsData$EventSequence = paste(AllWorms_SuccessiveEventsData$LastEventType, " then ", AllWorms_SuccessiveEventsData$scoreType, sep = "")

The AllWorms_SuccessiveEventsData dataframe is the AllWorms dataframe with six new columns that look like the following.

It’s in this format that we can gather complete time samples of each of the 10 event pairs we care about . For example, the second row gives us an example of the Period Start then Goal event pair and a time between them of 81 seconds. The third row gives an example of a goal being followed by a behind from the same team: here it took 80 seconds.

We’ll some back later and discuss exactly how we gather, summarise and use this data.

GATHERING QUARTER LENGTH DATA

We also need to model the end of quarters for which purpose we’ll fit a simple ordinary least squares model to historical quarter length data, using the period number, and the number of goals and quarters registered as explanatory variables or regressors.

QtrTimes = AllWorms  %>% filter(Year != 2020) %>% filter(periodNumber < 5) %>% group_by(periodNumber, match_id) %>%
summarise(Examples = length(match_id),
          Length = TotalPeriodSeconds[1],
          NumberGoals = sum(scoreType == "GOAL", na.rm = TRUE),
          NumberBehinds = sum(scoreType %in% c("GOAL", "BEHIND"), na.rm = TRUE))
model_rel_mean = lm(Length ~ periodNumber + NumberGoals + NumberBehinds, data = QtrTimes)
# Length ~ 1,593 + 4.6 x periodNumber + 33 x NumberGoals + 2.6 x NumberBehinds
sd_dev = sd(QtrTimes$Length - predict(model_rel_mean))
# 107.8851

We’ll use this information to set an initial length for every quarter by drawing randomly from a Normal distribution with Mean 1,593 + 4.5 x the period number seconds, and standard deviation of 108 seconds.

Then, during the simulation, we’ll add 33 seconds to the quarter length as each goal is scored, and 3 seconds as each behind is scored.

Let’s move on then to that simulation.

SIMULATING GAMES

Rather than providing you with code that simulates a single game, I’ll instead provide code that will let you do it for a number of games, and in parallel. In particular, I’m going to simulate all of the games in the 2024 season 100 times, using as the inputs for expected scoring shots, expected margin, and expected conversion those from my MoSHBODS model.

Because of its length, here we’ll review the code in chunks.

# Read in the MoSHBODS expected scoring, margin and conversion data
StatData = read.csv('MoSH_2024Season.csv')
# Set for reproducability. Change to get fresh random data
set.seed(57834)
# Play each of the 216 games 100 times
reps = 21600
# Start a timer
t_start = Sys.time() 
# Create a cluster
registerDoParallel(cl <- makeCluster(20))
# Note that each pass will simulate a single game
results_list = foreach(rep_num = 1:reps, .packages = c("dplyr", "HDInterval")) %dopar% {
      # Cycle through the 216 games in order
      chosen_game = rep_num %% 216      
      if (chosen_game == 0) { chosen_game = 216 }
      # Set Desired number of Scoring Shots
      ExpSS = StatData$ExpSS[chosen_game]
      # Set Desired Margin (from home team perspective)
      ExpMargin = StatData$ExpMarg[chosen_game]
      # Desired Conversion Rate
      # We've assumed the same, fixed conversion rate for the home and away teams here
      ExpConvRate = StatData$ExpConv[chosen_game]
      # Calculate the implied home team share of scoring shots
      # We will use this value as the fixed probability that the next scoring shot is by the home team
      ExpMarginAsSSDiff = ExpMargin/(ExpConvRate*5+1)
      ExpSS_Home = (ExpMarginAsSSDiff + ExpSS)/2
      ExpSS_Away = ExpSS - ExpSS_Home      
      ExpHomeShare_SS = ExpSS_Home/ExpSS
      # Generate initial base quarter lengths
      ExpLengths = c(round(rnorm(1,model_rel_mean$coef[1] + model_rel_mean$coef[2]*1, sd_dev),0),
                     round(rnorm(1,model_rel_mean$coef[1] + model_rel_mean$coef[2]*2, sd_dev),0),
                     round(rnorm(1,model_rel_mean$coef[1] + model_rel_mean$coef[2]*3, sd_dev),0),
                     round(rnorm(1,model_rel_mean$coef[1] + model_rel_mean$coef[2]*4, sd_dev),0))
      # Create a receptacle for the simulation data for a single game. Each row will be an event
      Worm = data.frame(Period = rep(0,200),
                        EventNum = rep(0,200),
                        ElapsedTime = rep(0,200),
                        ScoreTeam = rep("",200),
                        ScoreType = rep("",200))     
      EventNum = 0

We now have everything set up ready to run the event-by-event part of the simulation.

      for (QtrNum in 1:4)
      {
          ElapsedTime = 0
          
          # Initialise the Quarter Length
          QLength = ExpLengths[QtrNum]
      
          # Create the first event row for the game, which is always the same
          EventNum = EventNum + 1
          Worm$Period[EventNum] = QtrNum
          Worm$ElapsedTime[EventNum] = 0
          Worm$ScoreTeam[EventNum] = "Period Start"
          Worm$ScoreType[EventNum] = "Period Start"
          
          QtrEnded = FALSE
          
          while (!QtrEnded)
          {
              EventNum = EventNum + 1
              
              # Get preliminary next scorer and score type
              if (runif(1,0,1) < ExpHomeShare_SS)
              {
                 Worm$ScoreTeam[EventNum] = "Home"
              } else
                {
                     Worm$ScoreTeam[EventNum] = "Away"                 
                }

              if (runif(1,0,1) < ExpConvRate)
              {
                 Worm$ScoreType[EventNum] = "Goal"
              }  else
                 {
                    Worm$ScoreType[EventNum] = "Behind"
                 }  
              
              # Get elapsed time to score for whichever of the 10 event pairs is applicable
              # These could all be in a single nested if-else, but that would look ugly.
              # The option of a CASE statement would be lovely
 
              if (Worm$ScoreTeam[EventNum-1] == "Period Start" &  Worm$ScoreType[EventNum] == "Goal")
              {
                  # We extract the relevant historical data given the pair of events we are considering
                  RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Period Start then Goal", LastEventWhich == "Period Start")

                 # We choose a random point on the inverseCDF formed from the data
                 # Think of this as selecting a random value from the entire sample
                 # Creating a CDF just covers over any gaps 
                 TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
              } 
  
              if (Worm$ScoreTeam[EventNum-1] == "Period Start" &  Worm$ScoreType[EventNum] == "Behind")
              {
                 RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Period Start then Behind", LastEventWhich == "Period Start")
                 TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
              }   
              
              if (Worm$ScoreTeam[EventNum-1] != "Period Start")
              {              
                    if (Worm$ScoreTeam[EventNum] == Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Goal" & Worm$ScoreType[EventNum-1] == "Goal")
                    {
                       RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Goal then Goal", LastEventWhich == "Same Team")
                       TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
                    }   

                    if (Worm$ScoreTeam[EventNum] == Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Goal" & Worm$ScoreType[EventNum-1] == "Behind")
                    {
                       RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Goal then Behind", LastEventWhich == "Same Team")
                       TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
                    }   

                    if (Worm$ScoreTeam[EventNum] == Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Behind" & Worm$ScoreType[EventNum-1] == "Behind")
                    {
                       RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Behind then Behind", LastEventWhich == "Same Team")
                       TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
                    }   

                    if (Worm$ScoreTeam[EventNum] == Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Behind" & Worm$ScoreType[EventNum-1] == "Goal")
                    {
                       RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Behind then Goal", LastEventWhich == "Same Team")
                       TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
                    }   
                
                    if (Worm$ScoreTeam[EventNum] != Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Goal" & Worm$ScoreType[EventNum-1] == "Goal")
                    {
                       RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Goal then Goal", LastEventWhich == "Other Team")
                       TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
                    }   

                    if (Worm$ScoreTeam[EventNum] != Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Goal" & Worm$ScoreType[EventNum-1] == "Behind")
                    {
                       RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Goal then Behind", LastEventWhich == "Other Team")
                       TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
                    }   

                    if (Worm$ScoreTeam[EventNum] != Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Behind" & Worm$ScoreType[EventNum-1] == "Behind")
                    {
                       RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Behind then Behind", LastEventWhich == "Other Team")
                       TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
                    }   

                    if (Worm$ScoreTeam[EventNum] != Worm$ScoreTeam[EventNum-1] & Worm$ScoreType[EventNum] == "Behind" & Worm$ScoreType[EventNum-1] == "Goal")
                    {
                       RelevantData = AllWorms_SuccessiveEventsData %>% filter(EventSequence == "Behind then Goal", LastEventWhich == "Other Team")
                       TimeToScore = round(inverseCDF(runif(1,0,1), ecdf(RelevantData$TimeBetweenEvents)),0)
                    }   
               }

At this point we have a candidate next event, but we’ve not checked to see if there’s time for it before the quarter ends.

In the first version of this script, I simply checked to see of the elapsed plus the between event time was greater than the current quarter length, and rejected the event if so. It struck me that this might disproportionately affect goal-scoring because the average goal to goal time for same or opposite teams, and the behind to goal time for opposite teams were longer than the goal to behind or behind to behind times. As such, a goal was more likely to be rejected than a behind, which would tend to drag the conversion rate down.

So, in the final version, I allow a candidate final goal if it would be scored withing 30 seconds of the current quarter length. Think of it as a goal after the siren.

In the simulations for the 2024 season, about 40% of simulated games had a goal after one or more of the 4 sirens.

So, let’s handle the end of quarters.

               if (ElapsedTime + TimeToScore <= QLength)
               {
                   ElapsedTime = ElapsedTime + TimeToScore
                   Worm$ElapsedTime[EventNum] = ElapsedTime
                   Worm$Period[EventNum] = QtrNum
                   Worm$EventNum[EventNum] = EventNum

                   # Extend the quarter length based on the event type
                   if(Worm$ScoreType[EventNum] == "Goal")
                   {
                      QLength = round(QLength + model_rel_mean$coef[3],0)
                   } else
                     {
                        QLength = round(QLength + model_rel_mean$coef[4],0)
                     } 
                   
               }  else
                  {
                      # Allow a goal at quarter end to extend the quarter by no more than 30 seconds
                      if ((Worm$ScoreType[EventNum] == "Behind") | (ElapsedTime + TimeToScore > QLength + 30))
                      {
                          Worm$Period[EventNum] = QtrNum
                          Worm$EventNum[EventNum] = EventNum
                          Worm$ScoreTeam[EventNum] = "Period End"
                          Worm$ScoreType[EventNum] = "Period End"
                          Worm$ElapsedTime[EventNum] = QLength
                          QtrEnded = TRUE
                      }  else
                         {
                              ElapsedTime = ElapsedTime + TimeToScore
                              QLength = ElapsedTime
                              Worm$ElapsedTime[EventNum] = ElapsedTime
                              Worm$Period[EventNum] = QtrNum
                              Worm$EventNum[EventNum] = EventNum
    
                              # Create simultaneous end-of-period event
                              EventNum = EventNum + 1
                              Worm$Period[EventNum] = QtrNum
                              Worm$EventNum[EventNum] = EventNum
                              Worm$ScoreTeam[EventNum] = "Period End"
                              Worm$ScoreType[EventNum] = "Period End"
                              Worm$ElapsedTime[EventNum] = QLength
                              QtrEnded = TRUE
                         }         
                 }
          }
      }

As a final step, we process the completed simulated games and add some useful cumulative data

      # Keep only the used rows of Worm
      Worm = Worm[1:EventNum,]
      
      Worm$HomeGoals = 0
      Worm$HomeBehinds = 0
      
      Worm$AwayGoals = 0
      Worm$AwayBehinds = 0
      
      for (rn in 2:nrow(Worm))
      {
          if (!Worm$ScoreTeam[rn] %in% c("Period Start", "Period End"))
          {
              if (Worm$ScoreTeam[rn] == "Home")
              {
                 if (Worm$ScoreType[rn] == "Goal")
                 {
                    Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1] + 1
                    Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1]
                    Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1]
                    Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1]
                 } else
                   {
                      Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1] 
                      Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1] + 1
                      Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1]
                      Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1]
                    }
              } else
                {
                   if (Worm$ScoreType[rn] == "Goal")
                   {
                      Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1]
                      Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1]
                      Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1] + 1
                      Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1]
                   } else
                     {
                        Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1] 
                        Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1]
                        Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1]
                        Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1] + 1
                      }
                }
            } else
              {
                  Worm$HomeGoals[rn] = Worm$HomeGoals[rn-1] 
                  Worm$HomeBehinds[rn] = Worm$HomeBehinds[rn-1]
                  Worm$AwayGoals[rn] = Worm$AwayGoals[rn-1]
                  Worm$AwayBehinds[rn] = Worm$AwayBehinds[rn-1]
              }    
      }
      
      Worm$HomeScore = 6*Worm$HomeGoals + Worm$HomeBehinds                
      Worm$AwayScore = 6*Worm$AwayGoals + Worm$AwayBehinds                
      Worm$HomeMargin = Worm$HomeScore - Worm$AwayScore
      
      Worm$TotalElapsedTime = ifelse(Worm$Period == 1, Worm$ElapsedTime,
                              ifelse(Worm$Period == 2, Worm$ElapsedTime + max(Worm$ElapsedTime[Worm$Period == 1]),
                              ifelse(Worm$Period == 3, Worm$ElapsedTime + max(Worm$ElapsedTime[Worm$Period == 1]) + max(Worm$ElapsedTime[Worm$Period == 2]),
                                     Worm$ElapsedTime + max(Worm$ElapsedTime[Worm$Period == 1]) + max(Worm$ElapsedTime[Worm$Period == 2]) + max(Worm$ElapsedTime[Worm$Period == 3]))))
             
      Worm$RepNum = rep_num
      Worm$ExpectedMargin = ExpMargin
      Worm$ExpectedSS = ExpSS
      Worm$ExpectedConv = ExpConvRate
      Worm$HomeShareSS = ExpHomeShare_SS
             
      return(Worm)               
      
}               

stopCluster(cl)

sim_results = do.call(rbind.data.frame, results_list)

print(Sys.time() - t_start)      

CREATING METRICS FOR EACH SIMULATED GAME

The final thing we’ll do is create a range of metrics that will summarise each simulated game, for example, the number of goal streaks recorded by each team, and the longest for each.

We’ll do this using the following function:

GameStats = function(Worm)
{
     Worm$HomeMarginLag = lag(Worm$HomeMargin,1)
     Worm$HomeMarginLag2 = lag(Worm$HomeMargin,2)
     
     HomeLeadTime = 0   
     AwayLeadTime = 0   
     DrawnTime = 0  
     UnderSixLeadTime = 0 
     MarginSecs = 0
     AbsMarginSecs = 0
     
     Worm$HomeScoreStreak = 0
     Worm$HomeGoalStreak = 0
        
     Worm$AwayScoreStreak = 0
     Worm$AwayGoalStreak = 0
     
     Worm$HomeScoreStreakStatus = "Off"
     Worm$HomeGoalStreakStatus = "Off"
        
     Worm$AwayScoreStreakStatus = "Off"
     Worm$AwayGoalStreakStatus = "Off"
     
     Worm = Worm %>% mutate(LastEventTime = lag(TotalElapsedTime, 1)) %>%
                     mutate(LastHomeMargin = if_else(is.na(LastEventTime), NA, lag(HomeMargin,1))) %>%
                     mutate(TimeBetweenEvents = if_else(is.na(LastEventTime), TotalElapsedTime, TotalElapsedTime - LastEventTime))
     
     LastLeader = "None"
     Worm$IsLeadChange = "No"
                                              
     for (rn in 2:nrow(Worm))
     {
          if (Worm$LastHomeMargin[rn] == 0)
          {
              DrawnTime = DrawnTime + Worm$TimeBetweenEvents[rn]
          }  else
             {
               if (Worm$LastHomeMargin[rn] > 0)
               {
                   HomeLeadTime = HomeLeadTime + Worm$TimeBetweenEvents[rn]
               } else
                 {
                     AwayLeadTime = AwayLeadTime + Worm$TimeBetweenEvents[rn]
                 }    
             }
          
          if ((Worm$HomeMargin[rn] > 0 & LastLeader == "Away") | (Worm$HomeMargin[rn] < 0 & LastLeader == "Home"))
              { 
                 Worm$IsLeadChange[rn] = "Yes"
              }   
              
          if (Worm$HomeMargin[rn] > 0) { LastLeader = "Home" }
          if (Worm$HomeMargin[rn] < 0) { LastLeader = "Away" }
              
          if (!is.na(Worm$HomeMarginLag[rn]))
          {    
              MarginSecs = MarginSecs + Worm$HomeMarginLag[rn] * Worm$TimeBetweenEvents[rn]    
              AbsMarginSecs = AbsMarginSecs + abs(Worm$HomeMarginLag[rn]) * Worm$TimeBetweenEvents[rn]    
          }    
          
          if (!is.na(Worm$HomeMarginLag[rn]) & abs(Worm$HomeMarginLag[rn]) < 6)
          {
              UnderSixLeadTime = UnderSixLeadTime + Worm$TimeBetweenEvents[rn]
          }
          
          ### Calculate streak data
          if (Worm$ScoreType[rn] %in% c("Period End", "Period Start"))
          {
              Worm$HomeScoreStreak[rn] = Worm$HomeScoreStreak[rn-1]
              Worm$HomeGoalStreak[rn] = Worm$HomeGoalStreak[rn-1]
           
              Worm$AwayScoreStreak[rn] = Worm$AwayScoreStreak[rn-1]
              Worm$AwayGoalStreak[rn] = Worm$AwayGoalStreak[rn-1]
           
              Worm$HomeScoreStreakStatus[rn] = Worm$HomeScoreStreakStatus[rn-1]
              Worm$HomeGoalStreakStatus[rn] = Worm$HomeGoalStreakStatus[rn-1]
        
              Worm$AwayScoreStreakStatus[rn] = Worm$AwayScoreStreakStatus[rn-1]
              Worm$AwayGoalStreakStatus[rn] = Worm$AwayGoalStreakStatus[rn-1]
          }
          
          if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Home" & Worm$HomeGoalStreak[rn-1] == 0 & Worm$HomeScoreStreak[rn-1] == 0)
          {
              Worm$HomeScoreStreak[rn] = 6
              Worm$HomeGoalStreak[rn] = 1
              
              Worm$HomeScoreStreakStatus[rn] = "On"
              Worm$HomeGoalStreakStatus[rn] = "On"
          }

          if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Home" & Worm$HomeGoalStreak[rn-1] == 0 & Worm$HomeScoreStreak[rn-1] > 0)
          {
              Worm$HomeScoreStreak[rn] = Worm$HomeScoreStreak[rn-1] + 6
              Worm$HomeGoalStreak[rn] = 1
              
              Worm$HomeScoreStreakStatus[rn] = "Continuing"
              Worm$HomeGoalStreakStatus[rn] = "On"
          }

          if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Home" & Worm$HomeGoalStreak[rn-1] > 0)
          {
              Worm$HomeScoreStreak[rn] = Worm$HomeScoreStreak[rn-1] + 6
              Worm$HomeGoalStreak[rn] = Worm$HomeGoalStreak[rn-1] + 1

              Worm$HomeScoreStreakStatus[rn] = "Continuing"
              Worm$HomeGoalStreakStatus[rn] = "Continuing"
          }

          # Ensure that a behind doesn't end a goal streak for either team
          if (Worm$ScoreType[rn] == "Behind" & Worm$ScoreTeam[rn] == "Home")
          {
              Worm$HomeScoreStreak[rn] = Worm$HomeScoreStreak[rn-1] + 1
              Worm$HomeGoalStreak[rn] = Worm$HomeGoalStreak[rn-1]              
              Worm$AwayGoalStreak[rn] = Worm$AwayGoalStreak[rn-1]              
              
              Worm$HomeScoreStreakStatus[rn] = "Continuing"
              Worm$HomeGoalStreakStatus[rn] = Worm$HomeGoalStreakStatus[rn-1]              
              Worm$AwayGoalStreakStatus[rn] = Worm$AwayGoalStreakStatus[rn-1]              
          }
          
          ##
          if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Away" & Worm$AwayGoalStreak[rn-1] == 0 & Worm$AwayScoreStreak[rn-1] == 0)
          {
              Worm$AwayScoreStreak[rn] = 6
              Worm$AwayGoalStreak[rn] = 1

              Worm$AwayScoreStreakStatus[rn] = "On"
              Worm$AwayGoalStreakStatus[rn] = "On"
          }

          if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Away" & Worm$AwayGoalStreak[rn-1] == 0 & Worm$AwayScoreStreak[rn-1] > 0)
          {
              Worm$AwayScoreStreak[rn] = Worm$AwayScoreStreak[rn-1] + 6
              Worm$AwayGoalStreak[rn] = 1
              
              Worm$AwayScoreStreakStatus[rn] = "Continuing"
              Worm$AwayGoalStreakStatus[rn] = "On"
          }

          if (Worm$ScoreType[rn] == "Goal" & Worm$ScoreTeam[rn] == "Away" & Worm$AwayGoalStreak[rn-1] > 0)
          {
              Worm$AwayScoreStreak[rn] = Worm$AwayScoreStreak[rn-1] + 6
              Worm$AwayGoalStreak[rn] = Worm$AwayGoalStreak[rn-1] + 1

              Worm$AwayScoreStreakStatus[rn] = "Continuing"
              Worm$AwayGoalStreakStatus[rn] = "Continuing"
          }

          # Ensure that a behind doesn't end a goal streak for either team
          if (Worm$ScoreType[rn] == "Behind" & Worm$ScoreTeam[rn] == "Away")
          {
              Worm$AwayScoreStreak[rn] = Worm$AwayScoreStreak[rn-1] + 1
              Worm$AwayGoalStreak[rn] = Worm$AwayGoalStreak[rn-1]              
              Worm$HomeGoalStreak[rn] = Worm$HomeGoalStreak[rn-1]              

              Worm$AwayScoreStreakStatus[rn] = "Continuing"
              Worm$AwayGoalStreakStatus[rn] = Worm$AwayGoalStreakStatus[rn-1]              
              Worm$HomeGoalStreakStatus[rn] = Worm$HomeGoalStreakStatus[rn-1]              
          }
     }

     for (rn in 1:(nrow(Worm)-1))
     {
          if (Worm$HomeScoreStreakStatus[rn] %in% c("On", "Continuing") & Worm$HomeScoreStreakStatus[rn+1] == "Off") { Worm$HomeScoreStreakStatus[rn] = "Ended" }
          if (Worm$HomeGoalStreakStatus[rn] %in% c("On", "Continuing") & Worm$HomeGoalStreakStatus[rn+1] == "Off") { Worm$HomeGoalStreakStatus[rn] = "Ended" }
          if (Worm$AwayScoreStreakStatus[rn] %in% c("On", "Continuing") & Worm$AwayScoreStreakStatus[rn+1] == "Off") { Worm$AwayScoreStreakStatus[rn] = "Ended" }
          if (Worm$AwayGoalStreakStatus[rn] %in% c("On", "Continuing") & Worm$AwayGoalStreakStatus[rn+1] == "Off") { Worm$AwayGoalStreakStatus[rn] = "Ended" }

          if (Worm$HomeScoreStreakStatus[nrow(Worm)] %in% c("On", "Continuing")) { Worm$HomeScoreStreakStatus[nrow(Worm)] = "Ended" }
          if (Worm$HomeGoalStreakStatus[nrow(Worm)] %in% c("On", "Continuing")) { Worm$HomeGoalStreakStatus[nrow(Worm)] = "Ended" }
          if (Worm$AwayScoreStreakStatus[nrow(Worm)] %in% c("On", "Continuing")) { Worm$AwayScoreStreakStatus[nrow(Worm)] = "Ended" }
          if (Worm$AwayGoalStreakStatus[nrow(Worm)] %in% c("On", "Continuing")) { Worm$AwayGoalStreakStatus[nrow(Worm)] = "Ended" }
     }    

     Stats = data.frame(GameLength = max(Worm$TotalElapsedTime),
                        HomeGoals = Worm$HomeGoals[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
                        HomeBehinds = Worm$HomeBehinds[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
                        AwayGoals = Worm$AwayGoals[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
                        AwayBehinds = Worm$AwayBehinds[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
                        HomeScore = Worm$HomeScore[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
                        AwayScore = Worm$AwayScore[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
                        HomeMargin = Worm$HomeScore[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)] - Worm$AwayScore[Worm$TotalElapsedTime == max(Worm$TotalElapsedTime)],
                        HomeLeadTime = HomeLeadTime,
                        AwayLeadTime = AwayLeadTime,
                        DrawnTime = DrawnTime,
                        LeadChangesTotal = sum(Worm$IsLeadChange == "Yes"),
                        LeadChangesQ1 = sum(Worm$IsLeadChange == "Yes" & Worm$Period == 1),
                        LeadChangesQ2 = sum(Worm$IsLeadChange == "Yes" & Worm$Period == 2),
                        LeadChangesQ3 = sum(Worm$IsLeadChange == "Yes" & Worm$Period == 3),
                        LeadChangesQ4 = sum(Worm$IsLeadChange == "Yes" & Worm$Period == 4),
                        AveHomeLead = MarginSecs/max(Worm$TotalElapsedTime),
                        AveLead = AbsMarginSecs/max(Worm$TotalElapsedTime),
                        MaxHomeLead = max(Worm$HomeMargin[Worm$ScoreType != "Period Start"], na.rm = TRUE),
                        LatestTimeOfMaxHomeLead = max(Worm$TotalElapsedTime[Worm$HomeMargin == max(Worm$HomeMargin, na.rm = TRUE)], na.rm = TRUE),
                        MaxAwayLead = -min(Worm$HomeMargin[Worm$ScoreType != "Period Start"], na.rm = TRUE),
                        LatestTimeOfMaxAwayLead =  max(Worm$TotalElapsedTime[Worm$HomeMargin == min(Worm$HomeMargin, na.rm = TRUE)], na.rm = TRUE),
                        TimeLastLeadChange = max(Worm$TotalElapsedTime[Worm$IsLeadChange == "Yes"], na.rm = TRUE),
                        TimeLeadUnderAGoal = UnderSixLeadTime)
                        
     Stats$SSperMinute = (Stats$HomeGoals + Stats$HomeBehinds + Stats$AwayGoals + Stats$AwayBehinds)/Stats$GameLength * 60
     Stats$PointsperMinute = (6*Stats$HomeGoals + Stats$HomeBehinds + 6*Stats$AwayGoals + Stats$AwayBehinds)/Stats$GameLength * 60
     Stats$PointsperSS = Stats$PointsperMinute/Stats$SSperMinute

     Stats$NumberHomeGoalStreaks = sum(Worm$HomeGoalStreakStatus == "Ended", na.rm = TRUE)
     Stats$AveLengthHomeGoalStreaks = sum(Worm$HomeGoalStreak[Worm$HomeGoalStreakStatus == "Ended"], na.rm = TRUE)/Stats$NumberHomeGoalStreaks
     Stats$MaxLengthHomeGoalStreaks = max(Worm$HomeGoalStreak[Worm$HomeGoalStreakStatus == "Ended"], na.rm = TRUE)
     
     Stats$NumberHomeScoreStreaks = sum(Worm$HomeScoreStreakStatus == "Ended", na.rm = TRUE)
     Stats$AvePtsHomeScoreStreaks = sum(Worm$HomeScoreStreak[Worm$HomeScoreStreakStatus == "Ended"], na.rm = TRUE)/Stats$NumberHomeScoreStreaks
     Stats$MaxPtsHomeScoreStreaks = max(Worm$HomeScoreStreak[Worm$HomeScoreStreakStatus == "Ended"], na.rm = TRUE)
     
     Stats$NumberAwayGoalStreaks = sum(Worm$AwayGoalStreakStatus == "Ended", na.rm = TRUE)
     Stats$AveLengthAwayGoalStreaks = sum(Worm$AwayGoalStreak[Worm$AwayGoalStreakStatus == "Ended"], na.rm = TRUE)/Stats$NumberAwayGoalStreaks
     Stats$MaxLengthAwayGoalStreaks = max(Worm$AwayGoalStreak[Worm$AwayGoalStreakStatus == "Ended"], na.rm = TRUE)
     
     Stats$NumberAwayScoreStreaks = sum(Worm$AwayScoreStreakStatus == "Ended", na.rm = TRUE)
     Stats$AvePtsAwayScoreStreaks = sum(Worm$AwayScoreStreak[Worm$AwayScoreStreakStatus == "Ended"], na.rm = TRUE)/Stats$NumberAwayScoreStreaks
     Stats$MaxPtsAwayScoreStreaks = max(Worm$AwayScoreStreak[Worm$AwayScoreStreakStatus == "Ended"], na.rm = TRUE)

     return(Stats)                                      
 
}

I won’t discuss anything from this code since this blog is long enough already and, hopefully, much of the code is self-explanatory.

To finish, then, here’s how we call this code in parallel.

registerDoParallel(cl <- makeCluster(20))

sim_results_list = foreach(rep_num = 1:reps, .packages = c("dplyr", "HDInterval")) %dopar% {

    GS = GameStats(sim_results %>% filter(RepNum == rep_num))
    GS$RepNum = rep_num
    return(GS)
}   

stopCluster(cl)

sim_game_stats = do.call(rbind.data.frame, sim_results_list)

The table at right compares the results of my simulations of the 2024 season with those for the actual games of 2024.

In general, the differences are relatively minor, although it does appear that the simulated games are a little less streaky.

The differences in the conversion rates can be entirely attributed to the MoSHBODS inputs, which set expected conversion rates at about 52.6%.

Below is a link where you can download the code and the MoSHBODS 2024 data. Please let me know if you’ve any questions or suggestions.

Score Progression Code Link