Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Junta Wu,Huiyun Li

doi:10.1155/2020/4275623

Abstract

Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Another problem is the sequential and iterative training data with autonomous vehicles subject to the law of causality, which is against the i.i.d. (independent identically distributed) data assumption of the training samples. This usually results in failure of the standard bootstrap when learning an optimal policy. In this paper, we propose a framework of m-out-of-n bootstrapped and aggregated multiple deep deterministic policy gradient to accelerate the training process and increase the performance. Experiment results on the 2D robot arm game show that the reward gained by the aggregated policy is 10%–50% better than those gained by subpolicies. Experiment results on the open racing car simulator (TORCS) demonstrate that the new algorithm can learn successful control policies with less training time by 56.7%. Analysis on convergence is also given from the perspective of probability and statistics. These results verify that the proposed method outperforms the existing algorithms in both efficiency and performance.

Highlights

Reinforcement learning is an active branch of machine learning, where an agent tries to maximize the accumulated reward when interacting with a complex and uncertain environment [1, 2]
Shi et al introduced deep soft policy gradient (DSPG) [18], an off-policy and stable model-free deep RL algorithm by combining policy and value-based methods under maximum entropy RL framework. e authors discover that the standard bootstrap is likely to fail when learning an optimal policy, since in most reinforcement learning tasks, the sequential and iterative training data subject to the law of causality, which is against the i.i.d. assumption of the training samples
In a classical scenario of reinforcement learning, an agent aims at learning an optimal policy according to the reward function by interacting with the environment E in discrete time steps, where policy is a map from the state space to action space [1]

Summary

Introduction

Reinforcement learning is an active branch of machine learning, where an agent tries to maximize the accumulated reward when interacting with a complex and uncertain environment [1, 2]. Such tasks could be solved with DQN by discretizing the continuous spaces, the instability of the control system may be increased For overcoming this difficulty, deterministic policy gradient (DPG) algorithm [9] with the DNN technique was proposed, producing deep deterministic policy gradient (DDPG) algorithm [10]. Interactive learning with the environment in multiple threads is performed at the same time, and each thread summarizes the learning results and stores them in a common place In this way, A3C avoids the problem of too strong correlation of empirical playback and achieves an asynchronous concurrent learning model. In consideration of the above shortcomings of the previous work, this paper introduces a simple DRL algorithm with m-out-of-n bootstrap technique [19, 20] and aggregated multiple DDPG structures.

Background

Methods

Results and Discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematical Problems in Engineering	Publication Date: Jan 22, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering

Lead the way for us

Similar Papers

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving
Yanliang Jin ... Qianhong Liu
Symmetry | VOL. 13
Yanliang Jin, et. al.Yanliang Jin ... Qianhong Liu
12 Jun 2021
Symmetry | VOL. 13

Deep Deterministic Policy Gradient Algorithm based Lateral and Longitudinal Control for Autonomous Driving
Zhu Gongsheng ... Pei Chunmei
-
Zhu Gongsheng, et. al.Zhu Gongsheng ... Pei Chunmei
01 Dec 2020
01 Dec 2020

UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm
Shuangxia Bai ... Jianmei Wang
Journal of Artificial Intelligence and Technology | VOL. -
Shuangxia Bai, et. al.Shuangxia Bai ... Jianmei Wang
07 Dec 2021
Journal of Artificial Intelligence and Technology | VOL. -

A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking
Jiying Wu ... Naifeng He
Machines | VOL. 10
Jiying Wu, et. al.Jiying Wu ... Naifeng He
21 Jun 2022
Machines | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering