Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm

William Li

doi:10.4236/jilsa.2018.102004

Abstract

Teaching computer programs to play games through machine learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications. Monte Carlo Tree Search (MCTS) is one of the key AI techniques developed recently that enabled AlphaGo to defeat a legendary professional Go player. What makes MCTS particularly attractive is that it only understands the basic rules of the game and does not rely on expert-level knowledge. Researchers thus expect that MCTS can be applied to other complex AI problems where domain-specific expert-level knowledge is not yet available. So far there are very few analytic studies in the literature. In this paper, our goal is to develop analytic studies of MCTS to build a more fundamental understanding of the algorithms and their applicability in complex AI problems. We start with a simple version of MCTS, called random playout search (RPS), to play Tic-Tac-Toe, and find that RPS may fail to discover the correct moves even in a very simple game position of Tic-Tac-Toe. Both the probability analysis and simulation have confirmed our discovery. We continue our studies with the full version of MCTS to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win. The main reason that MCTS often fails to detect sudden death/win lies in the random playout search nature of MCTS, which leads to prediction distortion. Therefore, although MCTS in theory converges to the optimal minimax search, with real world computational resource constraints, MCTS has to rely on RPS as an important step in its search process, therefore suffering from the same fundamental prediction distortion problem as RPS does. By examining the detailed statistics of the scores in MCTS, we investigate a variety of scenarios where MCTS fails to detect sudden death/win. Finally, we propose an improved MCTS algorithm by incorporating minimax search to overcome prediction distortion. Our simulation has confirmed the effectiveness of the proposed algorithm. We provide an estimate of the additional computational costs of this new algorithm to detect sudden death/win and discuss heuristic strategies to further reduce the search complexity.

Highlights

Teaching computer programs to play games through learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications [1]
We continue our studies with the full version of Monte Carlo Tree Search (MCTS) to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win
MCTS in theory converges to the optimal minimax search, with real world computational resource constraints, MCTS has to rely on random playout search (RPS) as an important step in its search process, suffering from the same fundamental prediction distortion problem as RPS does

Summary

Introduction

Teaching computer programs to play games through learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications [1]. In 1997 a computer program called IBM Deep Blue defeated Garry Kasparov, the reigning world chess champion. The principle idea of Deep Blue is to generate a search tree and to evaluate the game positions by applying expert knowledge on chess. To many people’s surprise, in March 2016, AlphaGo, a computer program created by Google’s DeepMind, won a five game competition by 4-1 over Lee Sedol, a legendary professional Go player. Monte Carlo Tree Search (MCTS) is one of the two key techniques (the other is deep learning) [4]

Objectives

Methods

Results

Conclusion