D2CFR: Minimize Counterfactual Regret With Deep Dueling Neural Network.

Huale Li,Zengyue Guo,Shuhan Qi,Jiajia Zhang,Xuan Wang

doi:10.1109/tnnls.2023.3314638

Abstract

Counterfactual regret minimization (CFR) is a popular method for finding approximate Nash equilibrium in two-player zero-sum games with imperfect information. Solving large-scale games with CFR needs a combination of abstraction techniques and certain expert knowledge, which constrains its scalability. Recent neural-based CFR methods mitigate the need for abstraction and expert knowledge by training an efficient network to directly obtain counterfactual regret without abstraction. However, these methods only consider estimating regret values for individual actions, neglecting the evaluation of state values, which are significant for decision-making. In this article, we introduce deep dueling CFR (D2CFR), which emphasizes the state value estimation by employing a novel value network with a dueling structure. Moreover, a rectification module based on a time-shifted Monte Carlo simulation is designed to rectify the inaccurate state value estimation. Extensive experimental results are conducted to show that D2CFR converges faster and outperforms comparison methods on test games.

Full Text