Abstract

In recent years, imperfect information game has become an important touchstone to test the level of artificial intelligence. There are many imperfect information game scenarios in the real-world, such as economic transactions, military games, automatic driving. Therefore, the study of imperfect information game problems has very important practical significance. Guandan is a type of imperfect information card game with four players which are divided into two teams. The mass hidden information in the Guandan game leads to a high-dimensional game state. Reinforcement learning algorithm has efficient ability in strategy search of computer games. But it cannot converge under the condition of imperfect information and high-dimensional state space which caused by Guandan Game. According to these problems, this paper introduces the Proximal Policy Optimization (PPO) algorithm based on deep reinforcement learning to solve the problem of imperfect information, high-dimensional state space, and action space. It enables the agent to perceive high-dimensional information and makes decisions according to the acquisition information. The experiment result shows that the decision model based on the Proximal Policy Optimization algorithm is better than the intelligence level of the Policy Gradient algorithm and A2C algorithm, which proves that the system has a self-learning, ability to improve the game level of Guandan.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call