Abstract
Deep multi-agent reinforcement learning (MARL) can efficiently learn decentralized policies for real-world applications. However, current MARL methods suffer from the difficulty of transferring knowledge from already learned tasks to improve its exploration. In this paper, we propose a novel MARL method called Qauxi, which forms coordinated exploration scheme to improve the traditional MARL algorithms by reusing the meta-experience transferred from auxiliary task. We also use the weighting function to weight the importance of the joint action in monotonic loss function in order to focus on more important joint actions and thus avoid yielding suboptimal policies. Furthermore, we prove the convergence of Qauxi based on contraction mapping theorem. Qauxi is evaluated on the widely adopted StarCraft benchmarks (SMAC) across easy, hard, and super hard scenarios. Experimental results show that the proposed method outperforms the state-of-the-art MARL methods by a large margin in the most challenging super hard scenarios.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.