The mainstream Multi-Agent Reinforcement Learning (MARL) methods introduce the teammate modeling or the communication mechanism into Centralized Training Decentralized Execution (CTDE) paradigm, which can improve coordination performance. However, the existing teammate modeling methods predict either actions or local observations, limiting their applicability. In addition, the traditional communication mechanism only considers the quantity of the communication links while ignoring the quality of retained communication links, leading to inefficient and redundant communication. To solve the above problems, this paper proposes a novel Multi-Agent Cooperative Strategy with Explicit Teammate Modeling and Targeted Informative Communication (MACS), which can generate and send the more informative message with the higher communication efficiency, further improving the coordination performance. Specifically, the Variational Auto-Encoder (VAE) is leveraged to allow each agent to simultaneously predict the observations and actions of teammates, thus generating more comprehensive communication message. Then, we propose a new Mutual Information (MI) between the communication message and teammate Q-value, which can obtain the informative message, ensuring the exploration and stability of the method. In addition, a targeted dynamic informative communication graph is established by the Graph Neural Network (GNN) which can reduce the redundant communication link through hypothetical analysis, further improving the overall communication efficiency. Eventually, we conduct experiments in StarCraft II, Collaborative Navigation, and Multi-Target Multi-Sensor Coverage environments. Experimental results show that the proposed approach is superior to the state-of-the-art in terms of coordination performance and communication efficiency.
Read full abstract