Abstract

Reinforcement learning in a multi-agent setting is very important for real-world applications, but it brings more challenges than those in a single-agent environment. In the multi-agent setting, the agent generally has a bias of overestimation on the value function. In our work, we pay attention to the issue of overestimation bias with continuous actions in the multi-agent learning environment. We propose a method to reduce this bias by adopting the distributional perspective on reinforcement learning. We combine it within the framework of off-policy learning Actor-Critic and propose a novel approach Multi-Agent Deep Distributional Deterministic Policy Gradient (MAD3PG). We empirically evaluate it in three competitive and cooperative multi-agent settings. Our results show that in a series of difficult motor tasks the agents trained by MAD3PG significantly outperforms existing benchmark.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call