Diversity Evolutionary Policy Deep Reinforcement Learning.

Jian Liu,Liming Feng,Nian Zhang

doi:10.1155/2021/5300189

Abstract

The reinforcement learning algorithms based on policy gradient may fall into local optimal due to gradient disappearance during the update process, which in turn affects the exploration ability of the reinforcement learning agent. In order to solve the above problem, in this paper, the cross-entropy method (CEM) in evolution policy, maximum mean difference (MMD), and twin delayed deep deterministic policy gradient algorithm (TD3) are combined to propose a diversity evolutionary policy deep reinforcement learning (DEPRL) algorithm. By using the maximum mean discrepancy as a measure of the distance between different policies, some of the policies in the population maximize the distance between them and the previous generation of policies while maximizing the cumulative return during the gradient update. Furthermore, combining the cumulative returns and the distance between policies as the fitness of the population encourages more diversity in the offspring policies, which in turn can reduce the risk of falling into local optimal due to the disappearance of the gradient. The results in the MuJoCo test environment show that DEPRL has achieved excellent performance on continuous control tasks; especially in the Ant-v2 environment, the return of DEPRL ultimately achieved a nearly 20% improvement compared to TD3.

Highlights

Reinforcement learning [1, 2], as an important branch of machine learning [3, 4], has always been a research hotspot
We propose the deep reinforcement learning (DEPRL) algorithm, which combines cross-entropy method (CEM) and TD3 to measure the distance between different policies through maximum mean discrepancy (MMD) method
Some contemporary policies maximize the cumulative return while maximizing the distance between them and the previous generation policies and obtain policies with large differences to increase the scope of exploration

Summary

Introduction

Reinforcement learning [1, 2], as an important branch of machine learning [3, 4], has always been a research hotspot. Value-based deep reinforcement learning methods estimate the value function through a neural network and use the value function output by the neural network to guide the agent to choose policies, such as deep Q network (DQN) algorithm [12]. Policy-based deep reinforcement learning methods can parameterize policies and achieve policy optimization through learning parameters, so that the agent can obtain the largest cumulative return, such as deterministic policy gradient (DPG) algorithm [5]. Deep reinforcement learning methods based on actor-critic structure combine value-based and policy-based methods to learn policies while fitting value functions, such as deep deterministic policy gradient (DDPG) algorithm. The actor-critic-based methods have the advantages of both value-based and Computational Intelligence and Neuroscience policy-based methods, they inherit the shortcomings of the policy gradient algorithm; that is, the policy update falls into a local optimal solution due to the disappearance of the gradient

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Intelligence and Neuroscience	Publication Date: Jan 1, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Diversity Evolutionary Policy Deep Reinforcement Learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience

Lead the way for us

Similar Papers

An Optimal Model of Power Source Investment Considering Carbon Emission Constraint Based on Deep Deterministic Policy Gradient Algorithm
Yawei Xue ... Lang Zhao
-
Yawei Xue, et. al.Yawei Xue ... Lang Zhao
16 Dec 2022
16 Dec 2022

Transition state performance optimization of propfan engine based on DDPG algorithm
Hua Zheng ... Dong-Zhu Zhao
Journal of Physics: Conference Series | VOL. 2472
Hua Zheng, et. al.Hua Zheng ... Dong-Zhu Zhao
01 May 2023
Journal of Physics: Conference Series | VOL. 2472

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving
Yanliang Jin ... Qianhong Liu
Symmetry | VOL. 13
Yanliang Jin, et. al.Yanliang Jin ... Qianhong Liu
12 Jun 2021
Symmetry | VOL. 13

A deep deterministic policy gradient algorithm based on averaged state-action estimation
Jian Xu ... Jianlin Qiu
Computers and Electrical Engineering | VOL. 101
Jian Xu, et. al.Jian Xu ... Jianlin Qiu
11 May 2022
Computers and Electrical Engineering | VOL. 101

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Diversity Evolutionary Policy Deep Reinforcement Learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience