Abstract

We evaluate the performance of various selection methods for the Monte Carlo Tree Search algorithm in two-player zero-sum extensive-form games with imperfect information. We compare the standard Upper Confident Bounds applied to Trees (UCT) along with the less common Exponential Weights for Exploration and Exploitation (Exp3) and novel Regret matching (RM) selection in two distinct imperfect information games: Imperfect Information Goofspiel and Phantom Tic-Tac-Toe. We show that UCT after initial fast convergence towards a Nash equilibrium computes increasingly worse strategies after some point in time. This is not the case with Exp3 and RM, which also show superior performance in head-to-head matches.

Highlights

  • Monte Carlo Tree Search (MCTS) is a family of sample-based tree search algorithms that has recently led to a significant improvement in the quality of stateof-the-art solvers for perfect information problems, such as the game of Go [1] or domain independent planning [2]

  • In an imperfect information variant of the game of Goofspiel and Phantom Tic-Tac-Toe, we show that these alternative selection strategies allow Information Set Monte Carlo Tree Search (IS-MCTS) to converge closer to the Nash equilibrium strategy and perform better in mutual matches

  • Upper Confident Bounds applied to Trees (UCT) is a successful selection function for perfect information problems, but it has been shown to converge to an exploitable strategy in a simultaneous move game [18], which is a special case of imperfect information games

Read more

Summary

Introduction

Monte Carlo Tree Search (MCTS) is a family of sample-based tree search algorithms that has recently led to a significant improvement in the quality of stateof-the-art solvers for perfect information problems, such as the game of Go [1] or domain independent planning [2]. The most important complication is that the optimal strategies in imperfect information games may require the players to make randomized decisions. Playing any of the actions all the time can always be exploited by the opponent, and the optimal strategy against a rational opponent is to play each action with the same probability Another important complication is the strong inter-dependency between the strategies in different parts of the game. We analyze various selection functions in Information Set Monte Carlo Tree Search (IS-MCTS) [3]. In an imperfect information variant of the game of Goofspiel and Phantom Tic-Tac-Toe, we show that these alternative selection strategies allow IS-MCTS to converge closer to the Nash equilibrium strategy and perform better in mutual matches.

C DE F G HG HG HG HIJIJ 3 2 -1 4 -2 1 4 2 1 2 -2 -1 0 1 2 3
Information Set Monte Carlo Tree Search
5: Use action It from distribution p and receive reward r
Exponential Weights for Exploration and Exploitation
Regret Matching
Experimental Evaluation
Convergence to Nash Equilibrium
Head to Head Matches
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call