Multi-agent actor centralized-critic with communication

David Simões,Nuno Lau,Luís Paulo Reis

doi:10.1016/j.neucom.2020.01.079

Abstract

Multiple real-world problems are naturally modeled as cooperative multi-agent systems, ranging from satellite formation to traffic monitoring. These systems require algorithms that can learn successful policies with independent agents that rely solely on local partial-observations of the environment. However, multi-agent environments are more complex, due to their partial-observability and non-stationarity from an agent’s perspective, as well as the structural credit assignment problem and the curse of dimensionality, and achieving coordination in such systems remains a complex challenge. To this end, we propose a multi-agent actor-critic algorithm called Asynchronous Advantage Actor Centralized-Critic with Communication (A3C3). A3C3 uses a centralized critic to estimate a value function, decentralized actors to approximate each agent’s policy function, and decentralized communication networks for each agent to share relevant information with its team. The critic can incorporate additional information, like the environment’s global state, when available, and optimizes the actor networks. The actor networks of an agent’s teammates optimize that agent’s communication network, such that each agent learns to output information that is relevant to the policies of others. A3C3 supports a dynamic amount of agents, noisy communication mediums, and can be horizontally scaled to shorten its learning phase. We evaluate A3C3 in two partially-observable multi-agent suites where agents benefit from communicating local information to each other. A3C3 outperforms state-of-the-art multi-agent algorithms, independent approaches, and centralized controllers with access to all agents’ observations.

Full Text