Abstract

Monte Carlo Tree Search (MCTS) algorithms show outstanding strengths in decision-making problems such as the game of Go. However, MCTS requires significant computing loads to evaluate many nodes in the decision tree to make a good decision. Parallelizing MCTS node evaluations is challenging because MCTS is a sequential process that each round of tree traversal depends on the previous node evaluations. In this work, we present SpecMCTS , a new approach for accelerating MCTS by speculatively traversing the search tree. Many MCTS applications, such as AlphaGo Zero, use a deep neural network (DNN) model to evaluate the tree nodes during the search. SpecMCTS uses a pair of DNN models, the speculation model and the main model . The faster (but less accurate) speculation model accelerates the sequential tree search while the more accurate main model improves the decision quality. SpecMCTS accelerates MCTS for the game of Go by up to $2.09\times {}$ on the NVIDIA T4 GPU. This performance improvement can be translated into a better decision quality by performing a larger number of tree traversals within the time limit. For a fixed decision time, SpecMCTS shows stronger gameplay (higher win rate) than the original sequential MCTS and state-of-the-art MCTS parallelization approaches.

Highlights

  • Monte Carlo Tree Search (MCTS) demonstrated its effectiveness in complex control domains that require future planning, such as video games [1] and the game of Go [2]–[4]

  • Regardless of the node evaluation method the dominant computing loads of MCTS come from those node evaluations, rather than the tree traversing

  • SpecMCTS accelerates the search process by using a pair of deep neural network (DNN) models: the speculation model and the main model. These models are trained for the same objective functions, but they use different DNN configurations to be used as different roles during the tree traversal

Read more

Summary

INTRODUCTION

Monte Carlo Tree Search (MCTS) demonstrated its effectiveness in complex control domains that require future planning, such as video games [1] and the game of Go [2]–[4]. Regardless of the node evaluation method (whether it is based on Monte-Carlo rollout simulations or calculated using DNNs) the dominant computing loads of MCTS come from those node evaluations, rather than the tree traversing. SpecMCTS accelerates the search process by using a pair of DNN models: the speculation model and the main model These models are trained for the same objective functions, but they use different DNN configurations to be used as different roles during the tree traversal. The speculation model may result in less accurate node evaluations, the resulting decision quality is better than the previous state-of-the-art for MCTS acceleration. We evaluate the performance and the decision quality of SpecMCTS for the game of Go. Compared to the sequential MCTS, SpecMCTS accelerates the tree traversal process by up to 2.07× on the NVIDIA Tesla T4 GPU. When the MCTS players are limited to a fixed decision time, SpecMCTS can result in a higher win rate compared to the sequential MCTS and the previous state-of-the-arts for MCTS acceleration

BACKGROUND
LIMITATIONS
CONSTRUCTING THE SPECULATION MODELS
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call