Human-in-the-Loop Reinforcement Learning in Continuous-Action Space.

Biao Luo,Fei Zhou,Zhengke Wu,Bing-Chuan Wang

doi:10.1109/tnnls.2023.3289315

Abstract

Human-in-the-loop for reinforcement learning (RL) is usually employed to overcome the challenge of sample inefficiency, in which the human expert provides advice for the agent when necessary. The current human-in-the-loop RL (HRL) results mainly focus on discrete action space. In this article, we propose a Q value-dependent policy (QDP)-based HRL (QDP-HRL) algorithm for continuous action space. Considering the cognitive costs of human monitoring, the human expert only selectively gives advice in the early stage of agent learning, where the agent implements human-advised action instead. The QDP framework is adapted to the twin delayed deep deterministic policy gradient algorithm (TD3) in this article for the convenience of comparison with the state-of-the-art TD3. Specifically, the human expert in the QDP-HRL considers giving advice in the case that the difference between the twin Q -networks' output exceeds the maximum difference in the current queue. Moreover, to guide the update of the critic network, the advantage loss function is developed using expert experience and agent policy, which provides the learning direction for the QDP-HRL algorithm to some extent. To verify the effectiveness of QDP-HRL, the experiments are conducted on several continuous action space tasks in the OpenAI gym environment, and the results demonstrate that QDP-HRL greatly improves learning speed and performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Human-in-the-Loop Reinforcement Learning in Continuous-Action Space.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Jan 1, 2024
Citations: 8

Similar Papers

Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces
Yahao Xu ... Hongbin Deng
Neurocomputing | VOL. 537
Yahao Xu, et. al.Yahao Xu ... Hongbin Deng
31 Mar 2023
Neurocomputing | VOL. 537

Hybrid Deep Reinforcement Learning Considering Discrete-Continuous Action Spaces for Real-Time Energy Management in More Electric Aircraft
Bing Liu ... Wei Yu
Energies | VOL. 15
Bing Liu, et. al.Bing Liu ... Wei Yu
30 Aug 2022
Energies | VOL. 15

Deep Deterministic Policy Gradient-based Parameter Selection Method of Notch Filters for Suppressing Mechanical Resonance in Industrial Servo Systems
Tae-Ho Oh ... Sang-Oh Kim
-
Tae-Ho Oh, et. al.Tae-Ho Oh ... Sang-Oh Kim
01 Aug 2019
01 Aug 2019

Optimization of Robotic Arm Grasping through Fractional-Order Deep Deterministic Policy Gradient Algorithm
Hui Geng ... Zhe Wang
Journal of Physics: Conference Series | VOL. 2637
Hui Geng, et. al.Hui Geng ... Zhe Wang
01 Nov 2023
Journal of Physics: Conference Series | VOL. 2637

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Human-in-the-Loop Reinforcement Learning in Continuous-Action Space.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems