Abstract

Multiagent reinforcement learning (RL) is widely used and can successfully solve many problems in the real world. In the multiagent RL system, a global critic network is used to guide each agent’s strategy to update to learn the most beneficial strategy for the collective. However, the global critic network also makes the current agent’s learning be affected by other agents’ strategies, which leads to unstable learning. To solve this problem, we propose dynamic decomposed multiagent deep deterministic policy gradient (DD-MADDPG): a new network that considers both global and local evaluations and adaptively adjusts the agent’s attention to the two evaluations. Besides, the use of the experience replay buffer by multiagent deep deterministic policy gradient (MADDPG) produces outdated experience, and the outdated strategies of other agents further affect the learning of the current agent. To reduce the influence of other agents’ outdated experience, we propose TD-Error and Time-based experience sampling (T2-PER) based on DD-MADDPG. We evaluate the proposed algorithm’s performance according to the learning stability and the average return obtained by the agents. We have conducted experiments in the MPE environment. The results show that the proposed method has better stability and higher learning efficiency than MADDPG and has a certain generalization ability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call