Abstract

We define and analyse three learning dynamics for two-player zero-sum discounted-payoff stochastic games. A continuous-time best-response dynamic in mixed strategies is proved to converge to the set of Nash equilibrium stationary strategies. Extending this, we introduce a fictitious-play-like process in a continuous-time embedding of a stochastic zero-sum game, which is again shown to converge to the set of Nash equilibrium strategies. Finally, we present a modified δ-converging best-response dynamic, in which the discount rate converges to 1, and the learned value converges to the asymptotic value of the zero-sum stochastic game. The critical feature of all the dynamic processes is a separation of adaption rates: beliefs about the value of states adapt more slowly than the strategies adapt, and in the case of the δ-converging dynamic the discount rate adapts more slowly than everything else.

Highlights

  • Evolutionary and learning approaches to game theory justify equilibrium play as the end point of a dynamic process resulting from adaptations made by boundedly rational players

  • The convergence of a continuous-time best-response dynamic to the set of Nash equilibria has been shown in Harris (1998), Hofbauer (1995), and Hofbauer and Sorin (2006) for two-player zero-sum games, in Harris (1998) for weighted-potential games, and in Berger (2005) for 2 × n games

  • If players are unable or unprepared to carry out equilibrium calculations or solve Bellman equations for future reward, could they learn the Nash equilibrium strategy in the end? In the present paper, we focus on zero-sum stochastic games with discounted payoff, as is introduced by Shapley (1953), and consider best-response dynamics

Read more

Summary

Introduction

Evolutionary and learning approaches to game theory justify equilibrium play as the end point of a dynamic process resulting from adaptations made by boundedly rational players. We finish by progressing further and propose a variant of the best-response dynamic such that the payoff in each auxiliary game converges to the corresponding asymptotic value of the zero-sum stochastic game when the discount factor increases to 1. This is achieved by once again evolving a parameter slowly in comparison to the others; in this case the discount factor adjusts towards 1 even more slowly than the continuation payoffs. We postpone the literature review of stochastic games, and the positioning of our work within that literature, to Section 6

The Game Models
Zero-sum Normal-form Games
Zero-sum Stochastic Games
An Auxiliary Game
The Best-response Dynamic in a Stochastic Game
Continuous-time state-dependent fictitious play
The δ-converging Best-response Dynamic
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call