Abstract

We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) method that combines exploration, improvement and supervised learning to find the value function and policy associated with Nash equilibrium. We identify sufficient conditions for convergence and correctness for such an approach. For a concrete instance of EIS where random policy is used for exploration, Monte-Carlo Tree Search is used for improvement and Nearest Neighbors is used for supervised learning, we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium in $\widetildeO(\varepsilon^-(d+4))$ steps when the underlying state-space of the game is continuous and d-dimensional. This is nearly optimal as we establish a lower bound of $\widetildeOmega (\varepsilon^-(d+2) )$ for any policy.

Highlights

  • In 2016, AlphaGo [24] became the first program to defeat the world champion in the game of Go

  • Motivated by the remarkable success of this method, in this work we study the problem of finding Nash Equilibrium for two-player turn-based zero-sum games and in particular consider a reinforcement learning based approach

  • We introduce the framework of Markov Games (MGs) with two players and zero-sum rewards

Read more

Summary

Introduction

In 2016, AlphaGo [24] became the first program to defeat the world champion in the game of Go. In 2016, AlphaGo [24] became the first program to defeat the world champion in the game of Go Soon after, another program, AlphaGo Zero (AGZ) [26], achieved even stronger performance despite learning the game from scratch given only the rules. AGZ mastered the game of Go entirely through self-play using a new reinforcement learning algorithm. The same algorithm was shown to achieve superhuman performance in Chess and Shogi [25]. Motivated by the remarkable success of this method, in this work we study the problem of finding Nash Equilibrium for two-player turn-based zero-sum games and in particular consider a reinforcement learning based approach

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call