Abstract

Stochastic games provide a framework for interactions among multiple agents and enable a myriad of applications. In these games, agents decide on actions simultaneously. After taking an action, the state of every agent updates to the next state, and each agent receives a reward. However, finding an equilibrium (if exists) in this game is often difficult when the number of agents becomes large. This paper focuses on finding a mean-field equilibrium (MFE) in an action-coupled stochastic game setting in an episodic framework. It is assumed that an agent can approximate the impact of the other agents’ by the empirical distribution of the mean of the actions. All agents know the action distribution and employ lower-myopic best response dynamics to choose the optimal oblivious strategy. This paper proposes a posterior sampling-based approach for reinforcement learning in the mean-field game, where each agent samples a transition probability from the previous transitions. We show that the policy and action distributions converge to the optimal oblivious strategy and the limiting distribution, respectively, which constitute an MFE.

Highlights

  • We live in a world where multiple agents interact repeatedly in a common environment

  • Learning in a Multi-agent reinforcement learning (MARL) is fundamentally different from the traditional single-agent reinforcement learning (RL) problem since agents interact with the environment and with each other

  • The optimal oblivious strategy obtained from Algorithm 1 and the limiting action distribution constitute a mean-field equilibrium and the value function obtained from the algorithm converges to the optimal value function of the true distribution

Read more

Summary

Motivation

We live in a world where multiple agents interact repeatedly in a common environment. Mean-field game drastically reduces the complexity, since an agent only needs to consider the empirical distribution of the actions played by other agents. Such mean-field games exist in several domains. If the number of agents is large, the game can be modeled as the mean-field game as the average investment made per agent impacts the decision of an agent. Another example of a mean-field game is the demand response price in the smart grid [8,9].

Contribution
Related Literature
Multi-Player Stochastic Game
Mean-Field Game
Value Function, Q Function and Policy
Stationary Mean-Field Equilibrium
Proposed Algorithm
15: Return
Convergence Result
Conditions for a Strategy to Be a MFE
Conditions of Lemma 1 Are Met for Any Optimal Oblivious Strategy
Sampling Does Not Lead to a Gap for Expected Value Function
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.