Abstract
The digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.
Highlights
For a long time, machine games and artificial intelligence have been closely related, and machine games are an important form of artificial intelligence
From the game theory of von Neumann [1], the father of computers, to the well-known AlphaGo [2], today, machine games have always been in the public eyes
We combine neural fictitious self-play (NFSP) and KR-Upper Confidence Bounds Applied to Trees (UCT) for digital curling games, where NFSP can avoid manual labeling of supervised data, and KR-UCT can be used for large game tree searching in continuous action space
Summary
Machine games and artificial intelligence have been closely related, and machine games are an important form of artificial intelligence. Digital curling game has many action strategies, large search space and strong uncertainty, and it is a typical extensive form game [3]. For extensive form games, some challenging problems are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. Some researchers used Monte Carlo search and KR-UCT algorithms to improve the game performance These solutions still need a lot of supervised data and prior knowledge. We combine NFSP and KR-UCT for digital curling games, where NFSP can avoid manual labeling of supervised data, and KR-UCT can be used for large game tree searching in continuous action space.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.