Abstract

One of the toughest challenges in the multi-agent deep reinforcement learning (MADRL) is that when the opponents' policies change rapidly, the collaborative agents can't learn well to respond to the opponents' policies effectively. This may lead to a local optimum w.r.t. the learned policy of the collaborative agents may be only locally optimal to the opponents' current policies. To address this problem, we propose a novel algorithm termed Friend-or-Foe Deep Deterministic Policy Gradient (FD2PG), in which the cooperative agents can be trained more robust and have stronger cooperation ability in continuous action space. These collaborative agents can generalize easily and respond correctly, even if their opponents' policies alter. Inspired by the classic Friend-or-Foe Q-learning algorithm (FFQ), we introduce the idea of minimizing the foes and maximizing the friends into the centralized training distributed execution framework, multi-agent deep deterministic policy gradient algorithm (MADDPG), to enhance collaborative agents' robustness and cooperativity. Besides, we introduce a Minimax Multi-Agent Learning (MMAL) method to explore two special equilibriums (the adversarial equilibrium and the coordination equilibrium), which can guarantee the convergence of FD2PG and improve optimization. Extensive fine-grained experiments, including four representative scenario experiments and two scale-performance correlation experiments, were conducted to demonstrate the superior performance of FD2PG comparing with existing baselines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.