Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

Chaohai Kang,Chuiting Rong,Fengcai Huo,Weijian Ren,Pengyun Liu

doi:10.1109/access.2021.3074535

Chaohai Kang, Chuiting Rong + Show 3 more

Open Access

https://doi.org/10.1109/access.2021.3074535

Copy DOI

Abstract

The traditional deep deterministic policy gradient (DDPG) algorithm has the disadvantages of slow convergence velocity and ease of falling into the local optimum. From these two perspectives, a DDPG algorithm based on the double network prioritized experience replay mechanism (DNPER-DDPG) is proposed in this paper. Firstly, the value function is approximated by introducing the idea of two neural networks, and the minimum of the action value functions generated by the two networks is selected as the updated value of the actor policy network, which reduces the incidence of local optimal policy. Then, the Q values obtained by the two networks and the immediate reward obtained by the environment are used as the criteria for prioritization, and the importance of the samples in the experience replay mechanism is divided to improve the convergence speed of the algorithm. Finally, the improved method is demonstrated in the classic control environment of OpenAI Gym, and the results show that the proposed method has increased convergence speed and cumulative reward compared with the comparison algorithm.

Highlights

With the development of the field of artificial intelligence, several achievements in the area of discrete action spaces have been made by reinforcement learning [1]–[3]
A deep deterministic policy gradient algorithm based on a double network prioritized experience replay (ER) mechanism is proposed in this paper
To reduce the probability of getting stuck in a local optimum, two critic networks are introduced into the structure of the algorithm

Summary

INTRODUCTION

With the development of the field of artificial intelligence, several achievements in the area of discrete action spaces have been made by reinforcement learning [1]–[3]. C. Kang et al.: DDPG Based on Double Network Prioritized Experience Replay reinforcement learning idea. The stochastic weight averaging method was introduced to reduce the influence of noise in the gradient estimator in training process It was tested in the continuous action space tasks of Atari and MuJoCo. The stability of the training process is thereby increased. The parallel actor network was introduced to speed up training efficiency and the prioritized experience replay was introduced to raise sample utilization. A reinforcement learning method combining prioritized experience replay and DDPG was proposed in [13]. The basic principle of DDPG is introduced, and its network structure and important parameters are elaborated in detail to determine its defects in processing continuous action space tasks. To improve the VOLUME 9, 2021 convergence of the algorithm, the priority function of samples in the experience replay mechanism is proposed.

DEEP DETERMINISTIC POLICY GRADIENT

PENDULUM CONTROL BASED ON IMPROVED ALGORITHM

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking
Jiying Wu ... Luwei Liao
Machines | VOL. 10
Jiying Wu, et. al.Jiying Wu ... Luwei Liao
21 Jun 2022
Machines | VOL. 10

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms.
Haifei Zhang ... Jian Xu
Computational intelligence and neuroscience | VOL. 2022
Haifei Zhang, et. al.Haifei Zhang ... Jian Xu
18 Nov 2022
Computational intelligence and neuroscience | VOL. 2022

An Improved DDPG Algorithm with Barrier Function for Lane-Change Decision-Making of Intelligent Vehicles
Tianshuo Feng ... Xiaochuan Zhang
-
Tianshuo Feng, et. al.Tianshuo Feng ... Xiaochuan Zhang
01 Jan 2020
01 Jan 2020

UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm
Shuangxia Bai ... Evgeny Neretin
Journal of Artificial Intelligence and Technology | VOL. -
Shuangxia Bai, et. al.Shuangxia Bai ... Evgeny Neretin
07 Dec 2021
Journal of Artificial Intelligence and Technology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access