Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

Baturay Saglam,Furkan Burak Mutlu,Dogan Can Cicek,Suleyman Serdar Kozat

doi:10.1007/s11063-024-11461-y

Abstract

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

Abstract

Talk to us

Similar Papers

More From: Neural Processing Letters

Lead the way for us

Journal: Neural Processing Letters	Publication Date: Mar 2, 2024
License type: CC BY 4.0

Similar Papers

Value activation for bias alleviation: Generalized-activated deep double deterministic policy gradients
Jiafei Lyu ... Xiu Li
Neurocomputing | VOL. 518
Jiafei Lyu, et. al.Jiafei Lyu ... Xiu Li
04 Nov 2022
Neurocomputing | VOL. 518

Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods
Baturay Saglam ... Dogan C Cicek
-
Baturay Saglam, et. al.Baturay Saglam ... Dogan C Cicek
01 Nov 2021
01 Nov 2021

Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks
Haobo Jiang ... Jin Xie
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Haobo Jiang, et. al.Haobo Jiang ... Jin Xie
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks.
Haobo Jiang ... Guangyu Li
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35
Haobo Jiang, et. al.Haobo Jiang ... Guangyu Li
01 Apr 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

Abstract

Talk to us

Similar Papers

More From: Neural Processing Letters