Abstract

Counterfactual regret minimization (CFR) is a popular method for finding approximate Nash equilibrium in two-player zero-sum games with imperfect information. Solving large-scale games with CFR needs a combination of abstraction techniques and certain expert knowledge, which constrains its scalability. Recent neural-based CFR methods mitigate the need for abstraction and expert knowledge by training an efficient network to directly obtain counterfactual regret without abstraction. However, these methods only consider estimating regret values for individual actions, neglecting the evaluation of state values, which are significant for decision-making. In this article, we introduce deep dueling CFR (D2CFR), which emphasizes the state value estimation by employing a novel value network with a dueling structure. Moreover, a rectification module based on a time-shifted Monte Carlo simulation is designed to rectify the inaccurate state value estimation. Extensive experimental results are conducted to show that D2CFR converges faster and outperforms comparison methods on test games.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.