Learning distributed cooperative policies for large-scale multirobot systems remains a challenging task in the multiagent reinforcement learning (MARL) context. In this work, we model the interactions among the robots as a graph and propose a novel off-policy actor-critic MARL algorithm to train distributed coordination policies on the graph by leveraging the ability of information extraction of graph neural networks (GNNs). First, a new type of Gaussian policy parameterized by the GNNs is designed for distributed decision-making in continuous action spaces. Second, a scalable centralized value function network is designed based on a novel GNN-based value function decomposition technique. Then, based on the designed actor and the critic networks, a GNN-based MARL algorithm named graph soft actor-critic (G-SAC) is proposed and utilized to train the distributed policies in an effective and centralized fashion. Finally, two custom multirobot coordination environments are built, under which the simulation results are performed to empirically demonstrate both the sample efficiency and the scalability of G-SAC as well as the strong zero-shot generalization ability of the trained policy in large-scale multirobot coordination problems.
Read full abstract