Abstract

There are only a few learning algorithms applicable to stochastic dynamic games. Learning in games is generally difficult because of the non-stationary environment in which each decision maker aims to learn its optimal decisions with minimal information in the presence of the other decision makers who are also learning. In the case of dynamic games, learning is more challenging because, while learning, the decision makers alter the state of the system and hence the future cost. In this paper, we present decentralized Q-learning algorithms for stochastic dynamic games, and study their convergence for the weakly acyclic case. We show that the decision makers employing these algorithms would eventually be using equilibrium policies almost surely in large classes of stochastic dynamic games.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call