An efficient and robust gradient reinforcement learning: Deep comparative policy

Jiaguo Wang,Meng Yang,Yang Pei,Chao Lei,Wenheng Li

doi:10.3233/jifs-233747

Abstract

Recently, actor-critic architectures such as deep deterministic policy gradient (DDPG) are able to understand higher-level concepts for searching rich reward, and generate complex actions in continuous action space, and widely used in practical applications. However, when action space is limited and has dynamic hard margins, training DDPG can be problematic and inefficiency. Since real-world actuators always have margins and interferences, after initialization, the actor network is likely to be stuck at a local optimal point on action space margin: actor gradient orients to the outside of action space but actuators stop at the margin. If the hard margins are complex, dynamic and unknown to the DDPG agent, it is unable to use penalty functions to recover from local optimum. If we enlarge the random process for local exploration, the training could be in potential risk of failure. Therefore, simply relying on gradient of critic network to train the actor network is not a robust method in real environment. To solve this problem, in this paper we modify DDPG to deep comparative policy (DCP). Rather than leveraging critic-to-actor gradient, the core training process of DCP is regulated by a T-fold compare among random proposed adjacent actions. The performance of DDPG, DCP and related algorithms are tested and compared in two experiments. Our results show that, DCP is effective, efficient and qualified to perform all tasks that DDPG can perform. More importantly, DCP is less likely to be influenced by the action space margins, DCP can provide more safety in avoiding training failure and local optimum, and gain more robustness in applications with dynamic hard margins in the action space. Another advantage is that, complex penalty for margin touching detection is not required, the reward function can always be brief and short.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An efficient and robust gradient reinforcement learning: Deep comparative policy

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent & Fuzzy Systems

Lead the way for us

Similar Papers

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation
Shan Zhong ... Zongzhang Zhang
Frontiers of Computer Science | VOL. 13
Shan Zhong, et. al.Shan Zhong ... Zongzhang Zhang
13 Feb 2018
Frontiers of Computer Science | VOL. 13

Energy management of hybrid electric bus based on deep reinforcement learning in continuous state and action space
Huachun Tan ... Yuankai Wu
Energy Conversion and Management | VOL. 195
Huachun Tan, et. al.Huachun Tan ... Yuankai Wu
18 May 2019
Energy Conversion and Management | VOL. 195

Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces
Yahao Xu ... Hongbin Deng
Neurocomputing | VOL. 537
Yahao Xu, et. al.Yahao Xu ... Hongbin Deng
31 Mar 2023
Neurocomputing | VOL. 537

Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
Juan C Santamaria ... Ashwin Ram
Adaptive Behavior | VOL. 6
Juan C Santamaria, et. al.Juan C Santamaria ... Ashwin Ram
01 Sep 1997
Adaptive Behavior | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An efficient and robust gradient reinforcement learning: Deep comparative policy

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent &amp; Fuzzy Systems

More From: Journal of Intelligent & Fuzzy Systems