Learning Communication for Cooperation in Dynamic Agent-Number Environment

Weiwei Liu,Qi Wang,Junjie Cao,Xiaolei Lang,Yong Liu,Shanqi Liu

doi:10.1109/tmech.2021.3076080

Abstract

The number of agents in many multiagent systems in the real world, such as storage robots and drone cluster systems, continually changes. Still, most current multiagent reinforcement learning (RL) algorithms are limited to fixed network dimensions, and prior knowledge is used to preset the number of agents in the training phase, which leads to a poor generalization of the algorithm. In addition, these algorithms use centralized training to solve the instability problem of multiagent systems. However, the centralized learning of large-scale multiagent RL algorithms will lead to an explosion of network dimensions, which in turn leads to very limited scalability of centralized learning algorithms. To solve these two difficulties, in this article propose a group centralized training and decentralized execution-unlimited dynamic agent-number network (GCTDE-UDAN). First, since we use the attention mechanism to select several leaders and establish a dynamic number of teams, and the UDAN performs a nonlinear combination of all agents' Q values when performing value decomposition, it is not affected by changes in the number of agents. Moreover, our algorithm can unite any agent to form a group and conduct centralized training within the group, avoiding network dimension explosion caused by the global centralized training of large-scale agents. Finally, we verified on the simulation and experimental platform that the algorithm can learn and perform cooperative behaviors in many dynamic multiagent environments.

Full Text