Alternated Greedy-Step Deterministic Policy Gradient

Xuesong Wang,Yang Gu,Longyang Huang,Jiazhi Zhang,Kun Yu,Yuhu Cheng

doi:10.1109/tcds.2023.3242274

Abstract

The greedy-step Q-learning (GQL) can effectively accelerate the Q-value updating process. However, since it is an improvement version of Q-learning, the problem of Q-value overestimation also exists. Since there are in total two max operators used to iteratively calculate Q-value in GQL, many existing solutions to reduce the Q-value estimation bias are invalid for GQL. To address the issue, an alternated greedy-step update (AGU) framework that consists of two independent Q-value estimators is proposed in this study. In the proposed AGU framework, one estimator is to determine the time step that can maximize the estimated n-step return and the other estimator is to update the prior estimator using the target value calculated on the basis of the determined time step. The convergence of AGU framework is proved in theoretical. In addition, an alternated greedy-step deterministic policy gradient (AGDPG) that can be applied to continuous-action tasks is proposed by combining the AGU framework with deep deterministic policy gradient (DDPG). Experiments of AGDPG on continuous-action tasks of MuJoCo platform highlights its superior performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Alternated Greedy-Step Deterministic Policy Gradient

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cognitive and Developmental Systems

Lead the way for us

Journal: IEEE Transactions on Cognitive and Developmental Systems	Publication Date: Dec 1, 2023
Citations: 1

Similar Papers

Alleviating the estimation bias of deep deterministic policy gradient via co-regularization
Yao Li ... Xiaoyang Tan
Pattern Recognition | VOL. 131
Yao Li, et. al.Yao Li ... Xiaoyang Tan
28 Jun 2022
Pattern Recognition | VOL. 131

Regularly updated deterministic policy gradient algorithm
Shuai Han ... Jiayu Yu
Knowledge-Based Systems | VOL. 214
Shuai Han, et. al.Shuai Han ... Jiayu Yu
05 Jan 2021
Knowledge-Based Systems | VOL. 214

Reinforcement Learning Control of Hydraulic Servo System Based on TD3 Algorithm
Xiaoming Yuan ... Qiang Gao
Machines | VOL. 10
Xiaoming Yuan, et. al.Xiaoming Yuan ... Qiang Gao
19 Dec 2022
Machines | VOL. 10

What is the value of the cross-sectional approach to deep reinforcement learning?
Amine Mohamed Aboussalah ... Chi-Guhn Lee
Quantitative Finance | VOL. ahead-of-print
Amine Mohamed Aboussalah, et. al.Amine Mohamed Aboussalah ... Chi-Guhn Lee
07 Dec 2021
Quantitative Finance | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alternated Greedy-Step Deterministic Policy Gradient

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cognitive and Developmental Systems