AbstractThis paper deals with finite nonzero‐sum Markov games under a discounted optimality criterion and infinite horizon. The state process evolves according to a stochastic difference equation and depends on players' actions as well as a random disturbance whose distribution is unknown to the players. The actions, the states, and the values of the disturbance are observed by the players, then they use the empirical distribution of the disturbances to estimate the true distribution and make choices based on the available information. In this context, we propose an almost surely convergent procedure—possibly after passing to a subsequence—to approximate Nash equilibria of the Markov game with the true distribution of the random disturbance.
Read full abstract