Abstract

The field of Multi-Agent Reinforcement Learning (MARL) continues to advance with the development of new and effective methods. This research is centered on two prominent approaches within this field: Multi-Agent Deep Deterministic Policy Gradient (MADDPG) and Go-Explore. The study explores the synergistic potential of combining these two methodologies to enhance rewards for individual agents as well as for agent groups. In the course of this research, MADDPG is introduced into the experimental environment, providing agents with both actor networks (policy networks) and critic networks (Q networks) to implement the actor-critic model. Additionally, each individual agent is equipped with a Go-Explore network, empowering them to conduct deeper explorations of the environment and accumulate rewards at an accelerated rate, often resulting in higher overall rewards. This novel approach emphasizes achieving a balance between individual and collaborative rewards, offering a promising avenue for optimizing multi-agent systems. The results of this study demonstrate that the combined method exhibits notable advantages in certain scenarios. Specifically, it showcases a higher rate of reward accumulation and improved overall performance. This research contributes to the MARL domain by highlighting the potential of combining MADDPG and Go-Explore to enhance the efficiency and effectiveness of multi-agent systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call