Abstract

Finding Nash equilibrium in the domain of imperfect information games as a challenging problem has received much attention. Neural Fictitious Self-Play (NFSP) is a popular model-free machine learning algorithm and has computed approximate Nash equilibrium on such games. However, the deep reinforcement learning method used to approximate the best response in NFSP requires reaching a fully observable Markov state, while the states in imperfect information games are partially observable and non-Markovian, which results in a poor approximation of the best response. Thus, NFSP needs more iterations to converge. In this study, we present a new reinforcement learning method that is inspired by counterfactual regret minimization to relax the Markov requirement by iteratively updating policy according to the regret matching process. Combining this new reinforcement learning algorithm with fictitious play, we further present a novel algorithm to find approximate Nash equilibrium in zero-sum imperfect information games. Experimental results in three benchmark games show that this new algorithm can find approximate Nash equilibrium effectively and converge much faster compared with baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.