Abstract
Monte Carlo Tree Search (MCTS) algorithms show outstanding strengths in decision-making problems such as the game of Go. However, MCTS requires significant computing loads to evaluate many nodes in the decision tree to make a good decision. Parallelizing MCTS node evaluations is challenging because MCTS is a sequential process that each round of tree traversal depends on the previous node evaluations. In this work, we present SpecMCTS , a new approach for accelerating MCTS by speculatively traversing the search tree. Many MCTS applications, such as AlphaGo Zero, use a deep neural network (DNN) model to evaluate the tree nodes during the search. SpecMCTS uses a pair of DNN models, the speculation model and the main model . The faster (but less accurate) speculation model accelerates the sequential tree search while the more accurate main model improves the decision quality. SpecMCTS accelerates MCTS for the game of Go by up to $2.09\times {}$ on the NVIDIA T4 GPU. This performance improvement can be translated into a better decision quality by performing a larger number of tree traversals within the time limit. For a fixed decision time, SpecMCTS shows stronger gameplay (higher win rate) than the original sequential MCTS and state-of-the-art MCTS parallelization approaches.
Highlights
Monte Carlo Tree Search (MCTS) demonstrated its effectiveness in complex control domains that require future planning, such as video games [1] and the game of Go [2]–[4]
Regardless of the node evaluation method the dominant computing loads of MCTS come from those node evaluations, rather than the tree traversing
SpecMCTS accelerates the search process by using a pair of deep neural network (DNN) models: the speculation model and the main model. These models are trained for the same objective functions, but they use different DNN configurations to be used as different roles during the tree traversal
Summary
Monte Carlo Tree Search (MCTS) demonstrated its effectiveness in complex control domains that require future planning, such as video games [1] and the game of Go [2]–[4]. Regardless of the node evaluation method (whether it is based on Monte-Carlo rollout simulations or calculated using DNNs) the dominant computing loads of MCTS come from those node evaluations, rather than the tree traversing. SpecMCTS accelerates the search process by using a pair of DNN models: the speculation model and the main model These models are trained for the same objective functions, but they use different DNN configurations to be used as different roles during the tree traversal. The speculation model may result in less accurate node evaluations, the resulting decision quality is better than the previous state-of-the-art for MCTS acceleration. We evaluate the performance and the decision quality of SpecMCTS for the game of Go. Compared to the sequential MCTS, SpecMCTS accelerates the tree traversal process by up to 2.07× on the NVIDIA Tesla T4 GPU. When the MCTS players are limited to a fixed decision time, SpecMCTS can result in a higher win rate compared to the sequential MCTS and the previous state-of-the-arts for MCTS acceleration
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.