ADP with MCTS algorithm for Gomoku

Zhentao Tang,Le L.V,Kun Shao,Dongbin Zhao

doi:10.1109/ssci.2016.7849371

Abstract

Inspired by the core idea of AlphaGo, we combine a neural network, which is trained by Adaptive Dynamic Programming (ADP), with Monte Carlo Tree Search (MCTS) algorithm for Gomoku. MCTS algorithm is based on Monte Carlo simulation method, which goes through lots of simulations and generates a game search tree. We rollout it and search the outcomes of the leaf nodes in the tree. As a result, we obtain the MCTS winning rate. The ADP and MCTS methods are used to estimate the winning rates respectively. We weight the two winning rates to select the action position with the maximum one. Experiment result shows that this method can effectively eliminate the neural network evaluation function's “short-sighted” defect. With our proposed method, the game's final prediction result is more accurate, and it outperforms the Gomoku with ADP algorithm.

Full Text