Abstract

Multiagent cooperation is one of the most attractive research fields in multiagent systems. There are many attempts made by researchers in this field to promote cooperation behavior. However, several issues still exist, such as complex interactions among different groups of agents, redundant communication contents of irrelevant agents, which prevents the learning and convergence of agent cooperation behaviors. To address the limitations above, a novel method called multiagent hierarchical cognition difference policy (MA-HCDP) is proposed in this paper. It includes a hierarchical group network (HGN), a cognition difference network (CDN), and a soft communication network (SCN). HGN is designed to distinguish different underlying information of diverse groups’ observations (including friendly group, enemy group, and object group) and extract different high-dimensional state representations of different groups. CDN is designed based on a variational auto-encoder to allow each agent to choose its neighbors (communication targets) adaptively with its environment cognition difference. SCN is designed to handle the complex interactions among the agents with a soft attention mechanism. The results of simulations demonstrate the superior effectiveness of our method compared with existing methods.

Highlights

  • IntroductionBased on the common paradigm of centralized learning with decentralized execution, some multiagent reinforcement learning (MARL) algorithms learn centralized critics for multiple agents and determine the decentralized action

  • TRANSFER considers the influence of communication among different agents, it ignores the influence of redundant communication, which make agents trained with TRANSFER obtain higher rewards and converge slower than multiagent deep deterministic policy gradient (MADDPG)

  • TRANSFER considers the influence of communication among different agents, it ignores the influence of redundant communication, which makes agents trained with TRANSFER obtain higher rewards and converge slower than MADDPG

Read more

Summary

Introduction

Based on the common paradigm of centralized learning with decentralized execution, some MARL algorithms learn centralized critics for multiple agents and determine the decentralized action When these methods are applied to environments with a large number of agents, they have their limitations. Agents need to cooperate with each other to complete different tasks in partially observable environments, which are considered as partially observable Markov games that are an extension of Markov games [23] They are defined by tenvironment state St , action spaces At = a1t , · · · , atN where N is the number of agents, ai is the action t = o t , . Each agent i learns a of agent i at time t, and observation spaces

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.