Abstract

Reinforcement learning in a multi-agent setting is very important for real-world applications, but it brings more challenges than those in a single-agent environment. In the multi-agent setting, the agent generally has a bias of overestimation on the value function. In our work, we pay attention to the issue of overestimation bias with continuous actions in the multi-agent learning environment. We propose a method to reduce this bias by adopting the distributional perspective on reinforcement learning. We combine it within the framework of off-policy learning Actor-Critic and propose a novel approach Multi-Agent Deep Distributional Deterministic Policy Gradient (MAD3PG). We empirically evaluate it in three competitive and cooperative multi-agent settings. Our results show that in a series of difficult motor tasks the agents trained by MAD3PG significantly outperforms existing benchmark.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.