Abstract

Despite the proliferation of multi-agent deep reinforcement learning (MADRL), most existing typical methods do not scale well to the dynamics of agent populations. And as the population increases, the dimensional explosion of joint state-action and the complex interaction between agents make learning extremely cumbersome, which poses the scalability challenge for MADRL. This paper focuses on the scalability issue of MADRL with homogeneous agents. In a natural population, local interaction is a more feasible mode of interplay rather than global interaction. And inspired by the strategic interaction model in economics, we decompose the value function of each agent into the sum of the expected cumulative rewards of the interaction between the agent and each neighbor. This novel value function is decentralized and decomposable, which enables it to scale well to the dynamic changes in the number of large-scale agents. Hereby, the corresponding strategic interaction reinforcement learning algorithm (SIQ), is proposed to learn the optimal policy of each agent, wherein a neural network is employed to estimate the expected cumulative reward for the interaction between the agent and one of its neighbors. We test the validity of the proposed method in a mixed cooperative-competitive confrontation game through numerical experiments. Furthermore, the scalability comparison experiments illustrate that the scalability of the SIQ algorithm outperforms the independent learning and mean field reinforcement learning algorithms in multiple scenarios with different and dynamically changing numbers.

Highlights

  • Multi-agent reinforcement learning (MARL) sophisticatedly combines the game theory, multi-agent system and reinforcement learning (RL)

  • This paper focuses on scalability issues of multi-agent deep reinforcement learning (MADRL) for homogeneous agents

  • A novel MARL formulation based on the strategic interaction model of economics is presented, which approximates the Q function of each agent with the sum of the expected cumulative rewards of related interaction pairs

Read more

Summary

INTRODUCTION

Multi-agent reinforcement learning (MARL) sophisticatedly combines the game theory, multi-agent system and reinforcement learning (RL). Following centralized training and decentralized execution paradigm, a corresponding strategic interaction reinforcement learning algorithm (SIQ) based on double Q networks is proposed, wherein a policy is learned from the experiences of all homogeneous agents and shared by all these agents. The major contribution in this paper are: A decentralized and scalable MARL formulation based on strategic interaction model is presented to handle the scalability issue of MADRL by approximately decomposing each agent’s complex interactions with others into the sum of interaction pairs between the agent and every neighbor. This novel Q function applies to traditional MARL and to MADRL.

PRELIMINARIES AND NOTATION
NUMERICAL EXPERIMENTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.