Abstract

Leveraging global state information to enhance policy optimization is a common approach in multi-agent reinforcement learning (MARL). Even with the supplement of state information, the agents still suffer from insufficient exploration in the training stage. Moreover, training with batch-sampled examples from the replay buffer will induce the policy overfitting problem, i.e., multi-agent proximal policy optimization (MAPPO) may not perform as good as independent PPO (IPPO) even with additional information in the centralized critic. In this paper, we propose a novel noise-injection method to regularize the policies of agents and mitigate the overfitting issue. We analyze the cause of policy overfitting in actor–critic MARL, and design two specific patterns of noise injection applied to the advantage function with random Gaussian noise to stabilize the training and enhance the performance. The experimental results on the Matrix Game and StarCraft II show the higher training efficiency and superior performance of our method, and the ablation studies indicate our method will keep higher entropy of agents’ policies during training, which leads to more exploration.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.