Research Summary

Hendrik Baier

doi:10.1609/aiide.v8i6.12481

Abstract

Monte-Carlo Tree Search (MCTS) is an online planning algorithm that combines the ideas of best-first tree search and Monte-Carlo evaluation. Since MCTS is based on sampling, it does not require a transition function in explicit form, but only a generative model of the domain. Because it grows a highly selective search tree guided by its samples, it can handle huge search spaces with large branching factors. By using Monte-Carlo playouts, MCTS can take long-term rewards into account even with distant horizons. Combined with multi-armed bandit algorithms to trade off exploration and exploitation, MCTS has been shown to guarantee asymptotic convergence to the optimal policy, while providing approximations when stopped at any time. The relatively new MCTS approach has started a revolution in computer Go. Furthermore, it has achieved considerable success in domains as diverse as the games of Hex, Amazons, LOA, and Ms. Pacman; in General Game Playing, planning, and optimization. Whereas the focus of previous MCTS research has been on the practical application, current research begins to address the problem of understanding the nature, the underlying principles, of MCTS. A careful understanding of MCTS will lead to more effective search algorithms. Hence, my two interrelated research questions are: How can we formulate models that increase our understanding of how MCTS works? and How can we use the developed understanding to create effective search algorithms? This research summary describes the first steps I undertook in these directions, as well as my plans for future work.

Full Text