Sampling diversity driven exploration with state difference guidance

Jiayi Lu,Shuai Han,Shuai Lü,Meng Kang,Junwei Zhang

doi:10.1016/j.eswa.2022.117418

Abstract

Exploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, these methods cannot take both global interaction dynamics and local environment changes into account simultaneously. In this paper, we propose a novel intrinsic reward for off-policy learning, which not only encourages the agent to take actions not fully learned from a global perspective, but also instructs the agent to trigger remarkable changes in the environment from a local perspective. Meanwhile, we propose the double-actors–double-critics framework to combine intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of intrinsic and extrinsic rewards in previous methods. This framework can be applied to off-policy learning algorithms based on the actor–critic method. We provide a comprehensive evaluation of our approach on the MuJoCo benchmark environments. The results demonstrate that our method can perform effective exploration in the environments with dense, deceptive and sparse rewards. Besides, we conduct sufficient ablation and quantitative analyses to intrinsic rewards. Furthermore, we also verify the superiority and rationality of our double-actors–double-critics framework through comparative experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sampling diversity driven exploration with state difference guidance

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Journal: Expert Systems With Applications	Publication Date: May 6, 2022
Citations: 1

Similar Papers

SNSE: State Novelty Sampling Exploration
Shaojie Li ... Zhenyu Zhang
-
Shaojie Li, et. al.Shaojie Li ... Zhenyu Zhang
26 Nov 2022
26 Nov 2022

Collaborative training of heterogeneous reinforcement learning agents in environments with sparse rewards: what and when to share?
Alain Andres ... Esther Villar-Rodriguez
Neural Computing and Applications | VOL. 35
Alain Andres, et. al.Alain Andres ... Esther Villar-Rodriguez
16 Sep 2022
Neural Computing and Applications | VOL. 35

Constrained reinforcement learning from intrinsic and extrinsic rewards
Eiji Uchibe ... Kenji Doya
-
Eiji Uchibe, et. al.Eiji Uchibe ... Kenji Doya
01 Jul 2007
01 Jul 2007

Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning
Zhuo Jiang ... Daiying Tian
-
Zhuo Jiang, et. al.Zhuo Jiang ... Daiying Tian
22 Oct 2021
22 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sampling diversity driven exploration with state difference guidance

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications