Sampling Efficient Deep Reinforcement Learning Through Preference-Guided Stochastic Exploration.

Wenhui Huang,Xiangkun He,Cong Zhang,Chen Lv,Jie Zhang,Jingda Wu

doi:10.1109/tnnls.2023.3317628

Abstract

Stochastic exploration is the key to the success of the deep Q -network (DQN) algorithm. However, most existing stochastic exploration approaches either explore actions heuristically regardless of their Q values or couple the sampling with Q values, which inevitably introduce bias into the learning process. In this article, we propose a novel preference-guided ϵ -greedy exploration algorithm that can efficiently facilitate exploration for DQN without introducing additional bias. Specifically, we design a dual architecture consisting of two branches, one of which is a copy of DQN, namely, the Q branch. The other branch, which we call the preference branch, learns the action preference that the DQN implicitly follows. We theoretically prove that the policy improvement theorem holds for the preference-guided ϵ -greedy policy and experimentally show that the inferred action preference distribution aligns with the landscape of corresponding Q values. Intuitively, the preference-guided ϵ -greedy exploration motivates the DQN agent to take diverse actions, so that actions with larger Q values can be sampled more frequently, and those with smaller Q values still have a chance to be explored, thus encouraging the exploration. We comprehensively evaluate the proposed method by benchmarking it with well-known DQN variants in nine different environments. Extensive results confirm the superiority of our proposed method in terms of performance and convergence speed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sampling Efficient Deep Reinforcement Learning Through Preference-Guided Stochastic Exploration.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Jan 1, 2024
Citations: 4

Similar Papers

Artificial Potential Field Incorporated Deep-Q-Network Algorithm for Mobile Robot Path Prediction
A Sivaranjani ... B Vinod
Intelligent Automation & Soft Computing | VOL. 35
A Sivaranjani, et. al.A Sivaranjani ... B Vinod
01 Jan 2023
Intelligent Automation & Soft Computing | VOL. 35

Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning.
Shota Ohnishi ... Kosuke Nakanishi
Frontiers in Neurorobotics | VOL. 13
Shota Ohnishi, et. al.Shota Ohnishi ... Kosuke Nakanishi
10 Dec 2019
Frontiers in Neurorobotics | VOL. 13

Towards an Energy-Efficient DQN-based User Association in Sub6GHz/mmWave Integrated Networks
Thi Ha Ly Dinh ... Yasushi Takatori
-
Thi Ha Ly Dinh, et. al.Thi Ha Ly Dinh ... Yasushi Takatori
01 Dec 2021
01 Dec 2021

Guidance law based on deep Q network algorithm
Xianjun He ... Mingyu Wu
Journal of Physics: Conference Series | VOL. 2235
Xianjun He, et. al.Xianjun He ... Mingyu Wu
01 May 2022
Journal of Physics: Conference Series | VOL. 2235

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sampling Efficient Deep Reinforcement Learning Through Preference-Guided Stochastic Exploration.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems