Abstract

In recent years, imperfect information game has become an important touchstone to test the level of artificial intelligence. There are many imperfect information game scenarios in the real-world, such as economic transactions, military games, automatic driving. Therefore, the study of imperfect information game problems has very important practical significance. Guandan is a type of imperfect information card game with four players which are divided into two teams. The mass hidden information in the Guandan game leads to a high-dimensional game state. Reinforcement learning algorithm has efficient ability in strategy search of computer games. But it cannot converge under the condition of imperfect information and high-dimensional state space which caused by Guandan Game. According to these problems, this paper introduces the Proximal Policy Optimization (PPO) algorithm based on deep reinforcement learning to solve the problem of imperfect information, high-dimensional state space, and action space. It enables the agent to perceive high-dimensional information and makes decisions according to the acquisition information. The experiment result shows that the decision model based on the Proximal Policy Optimization algorithm is better than the intelligence level of the Policy Gradient algorithm and A2C algorithm, which proves that the system has a self-learning, ability to improve the game level of Guandan.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.