Value activation for bias alleviation: Generalized-activated deep double deterministic policy gradients

Jiafei Lyu,Yu Yang,Jiangpeng Yan,Xiu Li

doi:10.1016/j.neucom.2022.10.085

Abstract

It is vital to accurately estimate the value function in Deep Reinforcement Learning (DRL) such that the agent could execute proper actions instead of suboptimal ones. However, existing actor-critic methods suffer more or less from underestimation bias or overestimation bias, which negatively affect their performance. In this paper, we reveal a simple but effective principle: proper value correction benefits bias alleviation, where we propose the generalized-activated weighting operator that uses any non-decreasing function, namely activation function, as weights for better value estimation. Particularly, we integrate the generalized-activated weighting operator into value estimation and introduce a novel algorithm, Generalized-activated Deep Double Deterministic Policy Gradients (GD3). We theoretically show that GD3 is capable of alleviating the potential estimation bias. We interestingly find that simple activation functions lead to satisfying performance with no additional tricks, and could contribute to faster convergence. Experimental results on numerous challenging continuous control tasks show that GD3 with task-specific activation outperforms the common baseline methods. We also uncover a fact that fine-tuning the polynomial activation function achieves superior results on most of the tasks. Codes will be available upon publication.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Value activation for bias alleviation: Generalized-activated deep double deterministic policy gradients

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Nov 4, 2022
Citations: 5

Similar Papers

Deep Reinforcement Learning: A New Frontier in Computer Vision Research
Sejuti Rahman ... Sujan Sarker
-
Sejuti Rahman, et. al.Sejuti Rahman ... Sujan Sarker
01 Jan 2020
01 Jan 2020

Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle
Qilei Zhang ... Qixin Sha
IEEE Access | VOL. 8
Qilei Zhang, et. al.Qilei Zhang ... Qixin Sha
01 Jan 2020
IEEE Access | VOL. 8

Sample effficient deep reinforcement learning for control

-

15 Dec 2019
15 Dec 2019

Deep Reinforcement Learning
Aske Plaat
-
Aske PlaatAske Plaat
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Value activation for bias alleviation: Generalized-activated deep double deterministic policy gradients

Abstract

Talk to us

Similar Papers

More From: Neurocomputing