Abstract

Multi-agent deep reinforcement learning (MDRL) is an emerging research hotspot and application direction in the field of machine learning and artificial intelligence. MDRL covers many algorithms, rules and frameworks, it is currently researched in swarm system, energy allocation optimization, stocking analysis, sequential social dilemma, and with extremely bright future. In this paper, a parallel-critic method based on classic MDRL algorithm MADDPG is proposed to alleviate the training instability problem in cooperative-competitive multi-agent environment. Furthermore, a policy smoothing technique is introduced to our proposed method to decrease the variance of learning policies. The suggested method is evaluated in three different scenarios of authoritative multi-agent particle environment (MPE). Multiple statistical data of experimental results show that our method significantly improves the training stability and performance compared to vanilla MADDPG.

Highlights

  • Reinforcement learning (RL) [1] is an important branch of machine learning

  • 1) EVALUATION OF TRAINING STABILITY OF ALGORITHM Figure 5 shows the mean episode reward curves of the original Multi-agent Deep Deterministic Policy Gradient (MADDPG) and MADDPG-PC (U = 2,3,4) in three scenarios after 60000 episodes of training, we can infer that after 30000 episodes the curves tend to be stable in each scenario, so our calculation is based on testing data after 30000 episodes

  • We can conclude that MADDPG-PC can stabilize the training process to a significant degree by parallel critics training simultaneously, we can speculate that most of the time the stability improves with the increasing number of critics because more critics can supply more stable policies

Read more

Summary

Introduction

Reinforcement learning (RL) [1] is an important branch of machine learning. The essence of RL is agents learning policies in the interaction process with the environment to maximize returns or achieve specific goals. Instead of guiding the agent on how to make actions correctly in supervised learning, RL usually evaluates and corrects the action selection based on the feedback signal from the environment. RL is more suitable for solving complicated decision-making problems due to easier reward function design and lower information requirements. Deep reinforcement learning (DRL) [2] that combines deep neural networks (DNN) with traditional RL methods has become a research hotspot and made tremendous breakthroughs in the computer vision system, robot control, large-scale real-time strategic games, etc.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.