Abstract

Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi-agent reinforcement learning methods usually update the value function of state action slowly, and the reward value of agents is low. This paper presents a Dueling Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method based on MADDPG, which modifies critic's network structure. The main work is to add two subnetworks behind the critic network of the traditional MADDPG method. This method allows the critic network to update its parameters faster and receive higher rewards. Finally, in order to verify the validity of the network structure, the improved framework is compared with the traditional MADDPG, DQN and DDPG methods in the simulation environment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call