Abstract
We study the stationary points and local geometry of gradient play for stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting by establishing a gradient domination condition for SGs. We characterize the structure of strict NEs and show that gradient play locally converges to strict NEs within finite steps. Further, for a subclass of SGs called Markov potential games, we prove that strict NEs are local maxima of the total potential function, thus locally stable under gradient play, and fully-mixed NEs are saddle points, thus unstable under gradient play.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have