On Reinforcement Learning for Turn-based Zero-sum Markov Games

Devavrat Shah,Zhi Xu,Qiaomin Xie,Varun Somani

doi:10.1145/3412815.3416888

Abstract

We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) method that combines exploration, improvement and supervised learning to find the value function and policy associated with Nash equilibrium. We identify sufficient conditions for convergence and correctness for such an approach. For a concrete instance of EIS where random policy is used for exploration, Monte-Carlo Tree Search is used for improvement and Nearest Neighbors is used for supervised learning, we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium in $\widetildeO(\varepsilon^-(d+4))$ steps when the underlying state-space of the game is continuous and d-dimensional. This is nearly optimal as we establish a lower bound of $\widetildeOmega (\varepsilon^-(d+2) )$ for any policy.

Highlights

In 2016, AlphaGo [24] became the first program to defeat the world champion in the game of Go
Motivated by the remarkable success of this method, in this work we study the problem of finding Nash Equilibrium for two-player turn-based zero-sum games and in particular consider a reinforcement learning based approach
We introduce the framework of Markov Games (MGs) with two players and zero-sum rewards

Summary

Introduction

In 2016, AlphaGo [24] became the first program to defeat the world champion in the game of Go. In 2016, AlphaGo [24] became the first program to defeat the world champion in the game of Go Soon after, another program, AlphaGo Zero (AGZ) [26], achieved even stronger performance despite learning the game from scratch given only the rules. AGZ mastered the game of Go entirely through self-play using a new reinforcement learning algorithm. The same algorithm was shown to achieve superhuman performance in Chess and Shogi [25]. Motivated by the remarkable success of this method, in this work we study the problem of finding Nash Equilibrium for two-player turn-based zero-sum games and in particular consider a reinforcement learning based approach

Objectives

Results

Conclusion