Abstract

The mainstream Go AI algorithms represented by AlphaZero and KataGo suffer from low-quality samples in the early training period and low exploration efficiency when performing traditional Monte Carlo Tree Search (MCTS). For the shortcomings mentioned above: The variable scale training is proposed, i.e., introducing a variable scale board with boundary conditions of randomly placed stones at the boundary periphery, to pre-train a small-scale network for recommending local move strategy and ownership. This network is used to improve the backbone network’s moving policy and state value, enhancing the quality of game samples in the early stages of training. To improve the efficiency and convergence speed of the search, we propose the Parallel Monte Carlo Tree Search with Potential-Upper-Bound (PUB-PMCTS), i.e., executing multiple unevaluated searches sequentially and then evaluating multiple leaf nodes in parallel; also, the variance of the node’s action values are used to forecast the potential upper limit of the node. In addition, we add a self-attention mechanism in the network to extract global context information of features and add maximum entropy loss to grow the exploration ability of the model. With the improvements described above, the bot TransGo is designed. Experimental results show that in a 13×13 Go environment, TransGo has more stable performance and higher game level in the early training period compared with other algorithms. After four days of training with TransGo, KataGo, and AlphaZero: TransGo improved by 102 Elo compared to KataGo and over 1000 Elo compared to AlphaZero.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call