This work focuses on multi-agent reinforcement learning (RL) with inter-agent communication, in which communication is differentiable and optimized through backpropagation. Such differentiable approaches tend to converge more quickly to higher-quality policies compared to techniques that treat communication as actions in a traditional RL framework. However, modern communication networks (e.g., Wi-Fi or Bluetooth) rely on discrete communication channels, for which existing differentiable approaches that consider real-valued messages cannot be directly applied, or require biased gradient estimators. Some works have overcome this problem by treating the message space as an extension of the action space, and use standard RL to optimize message selection, but these methods tend to converge slower and to inferior policies. In this paper, we propose a stochastic message encoding/decoding procedure that makes a discrete communication channel mathematically equivalent to an analog channel with additive noise, through which gradients can be backpropagated. Additionally, we introduce an encryption step for use in noisy channels that forces channel noise to be message-independent, allowing us to compute unbiased derivative estimates even in the presence of unknown channel noise. To the best of our knowledge, this work presents the first differentiable communication learning approach that can compute unbiased derivatives through channels with unknown noise. We demonstrate the effectiveness of our approach in two example multi-robot tasks: a path finding and a collaborative search problem. There, we show that our approach achieves learning speed and performance similar to differentiable communication learning with real-valued messages (i.e., unlimited communication bandwidth), while naturally handling more realistic real-world communication constraints. Content Areas: Multi-Agent Communication, Reinforcement Learning.
Read full abstract