Finding an Optimal Strategy for Shut The Box: Part 2
/This is the second in a series of posts that investigates the creation of an optimal strategy for the game Shut the Box, the first of which I’d recommend you read before continuing.
MEASURING STRATEGY PERFORMANCE
In the previous blog we created a dataframe names GameSpace that held 8,599 different strategies, each summarising an action taken when faced with a given set of open boxes and a given dice roll - that is, a strategy.
In some cases, there’s only one strategy that applies - say, for example, we roll a total of 6 and the open boxes are 1, 2, 6, 7, and 8 - but in other cases we’ll have a choice to make. It’s these choices that we want to optimise.
To begin with, we’re going to play a large number of games where, whenever we have a choice, we pick at random with equal weights amongst those available strategies, oblivious to how they might help or harm our chances of winning.
The first bit of code to do this appears below and, roughly speaking does the following:
Reads in the GameSpace CSV
Creates a cluster of 10 parallel threads that, between them, will handle the 5 million simulations (you might need to reduce this depending on the computing power available to you)
Creates a Results datafame for each simulation that will allow us to track, step-by-step, the strategies used within a single replicate. The primary purpose for doing this is to allow us to associate each strategy with a win or a loss once that outcome has been determined.
Simulates the playing of a single game of Shut the Box and returns the result in Results. At each step, as mentioned, we choose at random, with equal probabilities, amongst the available strategies
Outside the foreach loop row binds (ie stacks) all of the Results dataframes, each having an identifying GameNum
Purely for interest’s sake, times the whole activity
library(dplyr)
library(doParallel)
GameSpace = read.csv('GameSpace.csv')
NumGames = 5000000
registerDoParallel(cl <- makeCluster(10))
t0 = Sys.time()
results_list = foreach(GameNum = 1:NumGames, .packages = c("dplyr")) %dopar% {
res_row = 0
Results = data.frame(GameNum = rep(0,10),
TossNum = rep(0,10),
StateNum = rep("",10),
TotalRolled = rep(0,10),
StrategyChosen = rep(0,10),
TargetSum = rep(0,10),
Result = rep("",10))
res_row = res_row + 1
Results$GameNum[res_row] = GameNum
Results$StateNum[res_row] = GameSpace$StateNum[GameSpace$Open == "123456789"][1]
Results$TossNum[res_row] = 1
ThisResult = "Undetermined"
while (ThisResult == "Undetermined")
{
Results$TotalRolled[res_row] = sample(1:6,1) + sample(1:6,1)
CandidateStrategies = GameSpace$StrategyNum[GameSpace$StateNum == Results$StateNum[res_row] & GameSpace$TotalRolled == Results$TotalRolled[res_row]]
if (length(CandidateStrategies) == 1) { Results$StrategyChosen[res_row] = CandidateStrategies } else { Results$StrategyChosen[res_row] = sample(CandidateStrategies, 1) }
# Check for Win / Loss
if(GameSpace$Outcome[GameSpace$StrategyNum == Results$StrategyChosen[res_row]] == "Win")
{
Results$Result[res_row] = "Win"
ThisResult = "Win"
} else
{
if(GameSpace$Outcome[GameSpace$StrategyNum == Results$StrategyChosen[res_row]] == "Loss")
{
Results$Result[res_row] = "Loss"
ThisResult = "Loss"
}
}
if(ThisResult == "Undetermined")
{
Results$Result[res_row] = "Undetermined"
res_row = res_row + 1
Results$GameNum[res_row] = GameNum
Results$StateNum[res_row] = GameSpace %>% filter(X1 == GameSpace$F1[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X2 == GameSpace$F2[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X3 == GameSpace$F3[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X4 == GameSpace$F4[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X5 == GameSpace$F5[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X6 == GameSpace$F6[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X7 == GameSpace$F7[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X8 == GameSpace$F8[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X9 == GameSpace$F9[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]]) %>% select(StateNum) %>% unique()
Results$TossNum[res_row] = Results$TossNum[res_row-1] + 1
}
}
return(Results[1:res_row,])
}
stopCluster(cl)
AllResults = do.call(rbind.data.frame, results_list)
print(Sys.time() - t0)
There’s a little bit of admin to do with the AllResults dataframe because StateNum has come back as a list, and TotalRolled as a numeric (and not as an integer, which it is in GameSpace and we’ll need to match for joining purposes).
Once we’ve attended to that we, roughly speaking:
Append some of the extra detail about each strategy held in GameSpace
Append the final outcome of a particular game - win or loss - to each row related to that particular game (using GameNum)
For interest, calculate and display the average winning rate across the simulated games
Append the Removed information from GameSpace (also mainly for convenience)
Create and write out a final StrategyPerformance dataframe that, as the name suggests, summarises the win/loss performance of each strategy whenever it has been used across the 5 million game simulation replicates
AllResults$StateNum = unlist(AllResults$StateNum)
AllResults$TotalRolled = as.integer(AllResults$TotalRolled)
AllResults = left_join(AllResults, GameSpace[,c("StateNum" , "TotalRolled", "RivalGroup", "StrategyCount", "Open")],
by = c("StateNum" = "StateNum", "TotalRolled" = "TotalRolled"), relationship = "many-to-many")
Outcomes = AllResults %>% group_by(GameNum) %>% summarise(OverallResult = ifelse(sum(Result == "Win") > 0, "Win", "Loss"))
AllResults = left_join(AllResults, Outcomes, by = c("GameNum" = "GameNum"), multiple = "all")
WinLoss = AllResults %>% group_by(GameNum) %>% summarise(WinCount = sum(Result == "Win") > 0)
mean(WinLoss$WinCount)
AllResults = left_join(AllResults, GameSpace[,c("StrategyNum", "Removed")], by = c("StrategyChosen" = "StrategyNum"))
#
StrategyPerformance = AllResults %>% group_by(StateNum, Open, TotalRolled, StrategyChosen, Removed, RivalGroup, StrategyCount) %>%
summarise(Occurrences = length(GameNum), WinRate = mean(OverallResult == "Win"))
write.csv(StrategyPerformance, 'StrategyPerformance_Random.csv', row.names = FALSE)
Deploying this relatively naive strategy results in wins about 7% of the time.
The StrategyPerfomance dataframe allows us to answer - at least in a preliminary way, because the ultimate success of any strategy is dependent on the strategies that are deployed after it, and so far we haven’t optimised those - questions about which strategies seem to be better than their rivals.
For example, on the first roll:
A roll of 2 through 9 is optimised by closing a single box equal to that total
A roll of 10 is optimised by closing boxes 1 and 9
A roll of 11 is perhaps optimised by closing boxes 2 and 9, or boxes 5 and 6
A roll of 12 is optimised by closing boxes 4 and 8
And, if faced with the state “3456789”:
A roll of 2 results in a loss
A roll of 3 through 6 requires us to choose the only available strategy, which involves closing a single box equal to the total
A roll of 7 is optimised by closing box 7 only
A roll of 8 is optimised by closing box 8 only
A roll of 9 is optimised by closing box 9 only
A roll of 10 is optimised by closing boxes 4 and 6
A roll of 11 is optimised by closing boxes 5 and 6
A roll of 12 is perhaps optimised by closing boxes 4 and 8, or 5 and 7
(Note that, for this series of blogs we are, for the most part, going to assume that a strategy that outperforms its rivals when decisions are made at random after it has been chosen, can also be assumed to outperform its rivals when optimal decisions are made after it has been chosen. We might, in a future blog, explore this assumption.)