Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Guojian Wang,Guojian Wang,Faguo Wu,Faguo Wu,Faguo Wu,Faguo Wu,Xiao Zhang,Xiao Zhang,Xiao Zhang,Xiao Zhang,Ning Guo,Ning Guo,Zhiming Zheng,Zhiming Zheng,Zhiming Zheng,Zhiming Zheng

doi:10.1016/j.knosys.2023.111334

Abstract

Deep reinforcement learning (DRL) faces significant challenges in addressing hard-exploration tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory-constrained exploration strategy for DRL. The proposed method guides the agent’s policy away from suboptimal solutions by regarding previous offline demonstrations as references. Specifically, this approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy-gradient-based optimization algorithm that utilizes adaptive clipped trajectory-distance rewards for both single- and multi-agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst-case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrated the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both single- and multi-agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at https://github.com/buaawgj/TACE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Similar Papers

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
Jianye Hao ... Peng Liu
IEEE transactions on neural networks and learning systems | VOL. 35
Jianye Hao, et. al.Jianye Hao ... Peng Liu
01 Jul 2024
IEEE transactions on neural networks and learning systems | VOL. 35

Single and Multi-Agent Deep Reinforcement Learning for AI-Enabled Wireless Networks: A Tutorial
Amal Feriani ... Ekram Hossain
IEEE Communications Surveys & Tutorials | VOL. 23
Amal Feriani, et. al.Amal Feriani ... Ekram Hossain
01 Jan 2020
IEEE Communications Surveys & Tutorials | VOL. 23

Deep Reinforcement Learning
Aske Plaat
-
Aske PlaatAske Plaat
01 Jan 2021
01 Jan 2021

Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle
Qilei Zhang ... Qixin Sha
IEEE Access | VOL. 8
Qilei Zhang, et. al.Qilei Zhang ... Qixin Sha
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems