Finding an Optimal Strategy for Shut The Box: Part 4
This is the fourth in a series of posts that investigates the creation of an optimal strategy for the game Shut the Box, and I’d recommend you read the first three blogs before continuing, starting with Part 1.
So far we have:
In Part 1
defined the rules of the version of Shut the Box that we’re exploring
created an initial GameSpace dataframe that contains all 506 valid states that the 9 boxes might be in prior to a move and, for each of them, the valid moves that might be made for a given rolled total (the Strategies)
removed redundant rows that relate to moves from the same state and with the same rolled total that result in the same outcome (viz, a win). For example, when the open box state is 123 and a total of 3 is rolled, closing boxes 1 and 2, or just box 3, both result in a win, so we keep only one of those strategies
added a grouping variable called RivalGroup that links alternative strategies for the same box state and rolled total
we end with a GameSpace dataframe containing 8,599 rows, and our goal over subsequent activities is to whittle that down to 5,566 rows, which is a single row for each combination of box state and total rolled, that row being the best strategy for that particular situation
In Part 2
estimated the performance of each strategy across 5m simulation replicates assuming that, when faced with two or more alternatives (strategies), we choose amongst them at random with equal weights
the performance of each strategy is based on how often, after its use, the final outcome is a win versus a loss
we end with a StrategyPerformance dataframe that records the number of times each strategy was deployed across the 5m replicates, and the proportion of times that its use resulted in a win
we also end with an estimate of the overall win rate for someone following the naive strategy described here of choosing amongst rival strategies at random, which is about 7%
In Part 3
cull rival strategies on the basis of statistical significance (making the possibly brave assumption that the best strategies can be determined by their relative performance when deployed in the naive manner described above)
identify those strategies that have been used fewer than 50 times across the 5m replicates and create new replicates that start with those strategies being deployed
cull rival strategies on the basis of statistical significance
identify those strategies that have been used fewer than 500 times across the 5m replicates and create new replicates that start with those strategies being deployed
cull rival strategies on the basis of statistical significance
identify those strategies that have been used fewer than 1,000 times across the 5m replicates and create new replicates that start with those strategies being deployed
cull rival strategies on the basis of statistical significance
we end this portion (at least in my case) with a GameSpace dataframe containing 5,766 rows and with 194 rivalries undecided (188 involving just two strategies and 6 involving three strategies)
we resolve the remaining rivalries by choosing the strategy that has the highest win rate across all of the simulation replicates run so far
Do we definitely have THE optimum set of strategies? No. But we certainly have a very good set of strategies.
A FINAL RUN
So far, all our performance estimates have come from using the strategies in an environment where strategies have been chosen at random from amongst those that remain at a given point in time.
In order to get a final, better estimate of the win rate associated with each of the chosen strategies (and the overall win rate) if these strategies are used, we’ll do one last run of this code from Part 2.
library(dplyr)
library(doParallel)
GameSpace = read.csv('GameSpace_Optimised.csv')
NumGames = 5000000
registerDoParallel(cl <- makeCluster(10))
t0 = Sys.time()
results_list = foreach(GameNum = 1:NumGames, .packages = c("dplyr")) %dopar% {
res_row = 0
Results = data.frame(GameNum = rep(0,10),
TossNum = rep(0,10),
StateNum = rep("",10),
TotalRolled = rep(0,10),
StrategyChosen = rep(0,10),
TargetSum = rep(0,10),
Result = rep("",10))
res_row = res_row + 1
Results$GameNum[res_row] = GameNum
Results$StateNum[res_row] = GameSpace$StateNum[GameSpace$Open == "123456789"][1]
Results$TossNum[res_row] = 1
ThisResult = "Undetermined"
while (ThisResult == "Undetermined")
{
Results$TotalRolled[res_row] = sample(1:6,1) + sample(1:6,1)
Results$StrategyChosen[res_row] = GameSpace$StrategyNum[GameSpace$StateNum == Results$StateNum[res_row] & GameSpace$TotalRolled == Results$TotalRolled[res_row]]
# Check for Win / Loss
if(GameSpace$Outcome[GameSpace$StrategyNum == Results$StrategyChosen[res_row]] == "Win")
{
Results$Result[res_row] = "Win"
ThisResult = "Win"
} else
{
if(GameSpace$Outcome[GameSpace$StrategyNum == Results$StrategyChosen[res_row]] == "Loss")
{
Results$Result[res_row] = "Loss"
ThisResult = "Loss"
}
}
if(ThisResult == "Undetermined")
{
Results$Result[res_row] = "Undetermined"
res_row = res_row + 1
Results$GameNum[res_row] = GameNum
Results$StateNum[res_row] = GameSpace %>% filter(X1 == GameSpace$F1[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X2 == GameSpace$F2[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X3 == GameSpace$F3[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X4 == GameSpace$F4[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X5 == GameSpace$F5[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X6 == GameSpace$F6[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X7 == GameSpace$F7[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X8 == GameSpace$F8[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]],
X9 == GameSpace$F9[GameSpace$StrategyNum == Results$StrategyChosen[res_row-1]]) %>% select(StateNum) %>% unique()
Results$TossNum[res_row] = Results$TossNum[res_row-1] + 1
}
}
return(Results[1:res_row,])
}
stopCluster(cl)
AllResults = do.call(rbind.data.frame, results_list)
print(Sys.time() - t0)
AllResults$StateNum = unlist(AllResults$StateNum)
AllResults$TotalRolled = as.integer(AllResults$TotalRolled)
AllResults = left_join(AllResults, GameSpace[,c("StateNum" , "TotalRolled", "RivalGroup", "StrategyCount", "Open")],
by = c("StateNum" = "StateNum", "TotalRolled" = "TotalRolled"), relationship = "many-to-many")
Outcomes = AllResults %>% group_by(GameNum) %>% summarise(OverallResult = ifelse(sum(Result == "Win") > 0, "Win", "Loss"))
AllResults = left_join(AllResults, Outcomes, by = c("GameNum" = "GameNum"), multiple = "all")
WinLoss = AllResults %>% group_by(GameNum) %>% summarise(WinCount = sum(Result == "Win") > 0)
mean(WinLoss$WinCount)
AllResults = left_join( AllResults, GameSpace[,c("StrategyNum", "Removed", "Outcome")], by = c("StrategyChosen" = "StrategyNum"))
OptimalStrategyPerformance = AllResults %>% group_by(RivalGroup, StateNum, Open, TotalRolled, StrategyNum, Removed) %>%
summarise(Occurrences = length(GameNum), WinRate = mean(OverallResult == "Win"))
write.csv(OptimalStrategyPerformance, 'Final Best Strategies and Estimated Win Rates.csv', row.names = FALSE)
Our final estimate of the WinRate we can expect using our chosen strategies is about 17%. So, by playing optimally, we have increased our chances compared to playing naively from about 7% to 17%.
Also, OptimalStrategyPerformance contains an estimate of the win rate associated with each strategy.
On closer inspection, it turns out that OptimalStrategyPerformance contains only 5,115 rows, which is 451 fewer than GameSpace’s 5,566. At first I put this down to some strategies not being called upon, by chance, across 5 million replicates, but some deeper thought made me realise that the choice of some strategies over others, particularly in early rolls, makes certain subsequent states (and their associated strategies) unreachable.
For example, the optimal strategy - according to the work we’ve done so far - when rolling a total of 9 on the very first roll is to close box 9. That makes it impossible to get into the State “2345679” and to choose any of the strategies that are considered optimal there for different rolled totals. All up, there are 41 states that are unreachable, and so the 11 strategies associated with each of those 41 states are not required. Hence the 451 “missing” rows in OptimalStrategyPerformance.
The code to find these States is as follows:
GameSpace = read.csv('GameSpace_Optimised.csv')
# Determine the next State for each Strategy
GameSpace$OpenAfterMove = paste(ifelse(GameSpace$F1 == 999, "", GameSpace$F1),
ifelse(GameSpace$F2 == 999, "", GameSpace$F2),
ifelse(GameSpace$F3 == 999, "", GameSpace$F3),
ifelse(GameSpace$F4 == 999, "", GameSpace$F4),
ifelse(GameSpace$F5 == 999, "", GameSpace$F5),
ifelse(GameSpace$F6 == 999, "", GameSpace$F6),
ifelse(GameSpace$F7 == 999, "", GameSpace$F7),
ifelse(GameSpace$F8 == 999, "", GameSpace$F8),
ifelse(GameSpace$F9 == 999, "", GameSpace$F9), sep = "")
GameSpace$StateNumAfterMove = ""
for (RN in 1:nrow(GameSpace))
{
if((length(GameSpace$Open == GameSpace$OpenAfterMove[RN]) > 0) & (GameSpace$Open[RN] != GameSpace$OpenAfterMove[RN]))
{
GameSpace$StateNumAfterMove[RN] = GameSpace$StateNum[GameSpace$Open == GameSpace$OpenAfterMove[RN]][1]
}
}
InitialStateNum = GameSpace$StateNum[GameSpace$Open == "123456789"][1]
PossibleStatesAfterRoll_1 = unique(GameSpace$StateNumAfterMove[GameSpace$StateNum %in% InitialStateNum])
PossibleStatesAfterRoll_2 = unique(GameSpace$StateNumAfterMove[GameSpace$StateNum %in% PossibleStatesAfterRoll_1])
PossibleStatesAfterRoll_3 = unique(GameSpace$StateNumAfterMove[GameSpace$StateNum %in% PossibleStatesAfterRoll_2])
PossibleStatesAfterRoll_4 = unique(GameSpace$StateNumAfterMove[GameSpace$StateNum %in% PossibleStatesAfterRoll_3])
PossibleStatesAfterRoll_5 = unique(GameSpace$StateNumAfterMove[GameSpace$StateNum %in% PossibleStatesAfterRoll_4])
PossibleStatesAfterRoll_6 = unique(GameSpace$StateNumAfterMove[GameSpace$StateNum %in% PossibleStatesAfterRoll_5])
PossibleStatesAfterRoll_7 = unique(GameSpace$StateNumAfterMove[GameSpace$StateNum %in% PossibleStatesAfterRoll_6])
AllVisitedStates = unique(c(InitialStateNum, PossibleStatesAfterRoll_1, PossibleStatesAfterRoll_2, PossibleStatesAfterRoll_3, PossibleStatesAfterRoll_4,
PossibleStatesAfterRoll_5, PossibleStatesAfterRoll_6, PossibleStatesAfterRoll_7))
MissedStates = setdiff(GameSpace$StateNum, AllVisitedStates)
length(MissedStates)
This code essentially walks down the tree of states that are possible after each roll of the dice when following the optimal strategies from the previous roll, and then creates a union of all states that are reached by at least one path.
RARELY USED STRATEGIES
In order for us to get an unbiased estimate of our long-term winning rate using our chosen strategies, we had to run the 5 million replicates as if they were regular games. One consequence of this is that some strategies are deployed a lot less often than others, so our estimate of the win rate associated with them is less precise.
As an example, in my case Strategies 2710 and 2699, which are both used when the State is 3569 (StateNumber S229) but, respectively, when a 12 or a 2 is rolled, were deployed only 8 and 11 times respectively.
For our last step, then, we’re going to address that scarcity by starting games at one of the under-represented strategies - just as we did in Part 3 - to ensure that every strategy has been deployed at least 1,000 times.
The familiar looking code to do this is as follows:
# Reload the original GameSpace
GS_Orig = read.csv('GameSpace.csv')
OptimalStrategyPerformance = left_join(OptimalStrategyPerformance, GS_Orig[,c("StrategyNum",
"X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9",
"P1", "P2", "P3", "P4",
"F1", "F2", "F3", "F4", "F5", "F6", "F7", "F8", "F9")],
by = c("StrategyChosen" = "StrategyNum"))
run_specific_strategies = function(GS, CountRequired)
{
max_score_to_win = 3
# Find all the strategies with fewer than CountRequired occurrences
RarelyUsedStrategies = GS %>% filter(Occurrences < CountRequired) %>% select(StateNum, TotalRolled, StrategyChosen, Occurrences)
completed = 0
for (RUS in RarelyUsedStrategies$StrategyChosen)
{
for (GameNum in 1:(CountRequired - RarelyUsedStrategies$Occurrences[RarelyUsedStrategies$StrategyChosen == RUS]))
{
res_row = 0
Results = data.frame(GameNum = rep(0,10),
TossNum = rep(0,10),
StateNum = rep("",10),
TotalRolled = rep(0,10),
StrategyChosen = rep(0,10),
TargetSum = rep(0,10),
Result = rep("",10))
res_row = res_row + 1
Results$GameNum[res_row] = GameNum
Results$StateNum[res_row] = RUS
Results$TossNum[res_row] = 1
ThisResult = "Undetermined"
while (ThisResult == "Undetermined")
{
if(res_row == 1)
{
Results$TotalRolled[res_row] = RarelyUsedStrategies$TotalRolled[RarelyUsedStrategies$StrategyChosen == RUS]
CandidateStrategies = RUS
} else
{
Results$TotalRolled[res_row] = sample(1:6,1) + sample(1:6,1)
CandidateStrategies = GS$StrategyChosen[GS$StateNum == Results$StateNum[res_row] & GS$TotalRolled == Results$TotalRolled[res_row]]
}
Results$StrategyChosen[res_row] = CandidateStrategies
# Check for win
Results$TargetSum[res_row] = ifelse(GS$F1[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F1[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0) +
ifelse(GS$F2[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F2[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0) +
ifelse(GS$F3[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F3[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0) +
ifelse(GS$F4[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F4[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0) +
ifelse(GS$F5[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F5[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0) +
ifelse(GS$F6[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F6[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0) +
ifelse(GS$F7[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F7[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0) +
ifelse(GS$F8[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F8[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0) +
ifelse(GS$F9[GS$StrategyChosen == Results$StrategyChosen[res_row]] != 999, GS$F9[GS$StrategyChosen == Results$StrategyChosen[res_row]], 0)
if(Results$TargetSum[res_row] <= max_score_to_win)
{
Results$Result[res_row] = "Win"
ThisResult = "Win"
for (ss in 1:res_row)
{
GS$Occurrences[GS$StrategyChosen == Results$StrategyChosen[ss]] = GS$Occurrences[GS$StrategyChosen == Results$StrategyChosen[ss]] + 1
GS$TimesWin[GS$StrategyChosen == Results$StrategyChosen[ss]] = GS$TimesWin[GS$StrategyChosen == Results$StrategyChosen[ss]] + 1
GS$WinRate = ifelse(GS$Occurrences == 0, NA, GS$TimesWin/GS$Occurrences)
}
} else
{
if(is.na(GS$P1[GS$StrategyChosen == Results$StrategyChosen[res_row]]))
{
Results$Result[res_row] = "Lost"
ThisResult = "Lost"
for (ss in 1:res_row)
{
GS$Occurrences[GS$StrategyChosen == Results$StrategyChosen[ss]] = GS$Occurrences[GS$StrategyChosen == Results$StrategyChosen[ss]] + 1
GS$WinRate = ifelse(GS$Occurrences == 0, NA, GS$TimesWin/GS$Occurrences)
}
}
}
if(ThisResult == "Undetermined")
{
Results$Result[res_row] = "Undetermined"
res_row = res_row + 1
Results$StateNum[res_row] = GS %>% ungroup() %>% filter(X1 == GS$F1[GS$StrategyChosen == Results$StrategyChosen[res_row-1]],
X2 == GS$F2[GS$StrategyChosen == Results$StrategyChosen[res_row-1]],
X3 == GS$F3[GS$StrategyChosen == Results$StrategyChosen[res_row-1]],
X4 == GS$F4[GS$StrategyChosen == Results$StrategyChosen[res_row-1]],
X5 == GS$F5[GS$StrategyChosen == Results$StrategyChosen[res_row-1]],
X6 == GS$F6[GS$StrategyChosen == Results$StrategyChosen[res_row-1]],
X7 == GS$F7[GS$StrategyChosen == Results$StrategyChosen[res_row-1]],
X8 == GS$F8[GS$StrategyChosen == Results$StrategyChosen[res_row-1]],
X9 == GS$F9[GS$StrategyChosen == Results$StrategyChosen[res_row-1]]) %>% select(StateNum) %>% unique()
Results$TossNum[res_row] = Results$TossNum[res_row-1] + 1
}
}
}
completed = completed + 1
print(paste("Completed ", completed, " of ", length(RarelyUsedStrategies$StrategyChosen), " strategies", sep = ""))
}
return(GS)
}
#########
OptimalStrategyPerformance$TimesWin = round(OptimalStrategyPerformance$Occurrences * OptimalStrategyPerformance$WinRate, 0)
OptimalStrategyPerformance = run_specific_strategies(OptimalStrategyPerformance, 1000)
write.csv(OptimalStrategyPerformance, 'Final Best Strategies and Estimated Win Rates - Boosted Sample.csv', row.names = FALSE)
For me, there were 1,259 Strategies that had not been used at least 1,000 times, so this code took a while to run.
We end, at last, with a set of strategies with which to play Shut The Box optimally (or at least very well) and, in the next and final blog, we’ll see what general observations we can make about those strategies (Spoiler: not a lot).