Winning Rate Prediction Model Based on Monte Carlo Tree Search for Computer Dou Dizhu

Guangyun Tan,Yongyi He,Huahu Xu,Peipei Wei,Xinxin Shi,Ping Yi

doi:10.1109/tg.2019.2940261

Abstract

Poker is the typical game of incomplete information, and remains a longstanding challenge problem in artificial intelligence (AI). The poker game of Dou Dizhu has been viewed as a thorny topic in AI because of its own characteristics. This article introduces a developed Monte Carlo tree search (MCTS) method for Dou Dizhu to solve the decision making effectively. We built the winning rate prediction model (WRPM) to predict the winning rate of moves as the initial situation estimation and improve the model to be more applicable to different player roles. Then, the WRPM is embedded as the core algorithm into MCTS for extension and simulation and named it WRPM-MCTS. In addition, we also train a card distribution prediction model to predict the holding cards of opponents for further improving the performance of WRPM-MCTS on the agent of Dou Dizhu. Experiments show that the WRPM-MCTS has a statistically significant performance better than the pure MCTS and the pure WRPM. In the game with human players from an online game platform, the WRPM-MCTS-based agent had the winning rate of 52.86% in 4 000 000 games and ranked in top 1.22% among 500 000 human players, indicating that this agent had reached the expert level of humans.

Full Text