Multi-Agent Reinforcement Learning based on Value Distribution

Ruiqun Li,Tao Tian,Fukai Jia,Ruobing Wang,Zhong Zheng

doi:10.1088/1742-6596/1651/1/012017

Abstract

Reinforcement learning in a multi-agent setting is very important for real-world applications, but it brings more challenges than those in a single-agent environment. In the multi-agent setting, the agent generally has a bias of overestimation on the value function. In our work, we pay attention to the issue of overestimation bias with continuous actions in the multi-agent learning environment. We propose a method to reduce this bias by adopting the distributional perspective on reinforcement learning. We combine it within the framework of off-policy learning Actor-Critic and propose a novel approach Multi-Agent Deep Distributional Deterministic Policy Gradient (MAD3PG). We empirically evaluate it in three competitive and cooperative multi-agent settings. Our results show that in a series of difficult motor tasks the agents trained by MAD3PG significantly outperforms existing benchmark.

Full Text