Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

Edward Lockhart,Julien Pérolat,Karl Tuyls,Jean-Baptiste Lespiau,Finbarr Timbers,Dustin Morrill,Marc Lanctot

doi:10.24963/ijcai.2019/66

Abstract

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player's strategy converges asymptotically to zero, and hence when both players employ this optimization, the joint policies converge to a Nash equilibrium. Unlike fictitious play (XFP) and counterfactual regret minimization (CFR), our convergence result pertains to the policies being optimized rather than the average policies. Our experiments demonstrate convergence rates comparable to XFP and CFR in four benchmark games in the tabular case. Using function approximation, we find that our algorithm outperforms the tabular version in two of the games, which, to the best of our knowledge, is the first such result in imperfect information games among this class of algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Scalable sub-game solving for imperfect-information games
Huale Li ... Shuhan Qi
Knowledge-Based Systems | VOL. 231
Huale Li, et. al.Huale Li ... Shuhan Qi
26 Aug 2021
Knowledge-Based Systems | VOL. 231

AutoCFR: Learning to Design Counterfactual Regret Minimization Algorithms
Hang Xu ... Qiang Fu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Hang Xu, et. al.Hang Xu ... Qiang Fu
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

D2CFR: Minimize Counterfactual Regret With Deep Dueling Neural Network.
Huale Li ... Xuan Wang
IEEE Transactions on Neural Networks and Learning Systems | VOL. PP
Huale Li, et. al.Huale Li ... Xuan Wang
01 Jan 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. PP

Monte Carlo Sampling for Regret Minimization in Extensive Games
...
-
, et. al. ...
07 Dec 2009
07 Dec 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

Abstract

Talk to us

Similar Papers