Actor-Critic With Synthesis Loss for Solving Approximation Biases.

Bo-Wen Guo,Fei Chao,Xiang Chang,Changjing Shang,Qiang Shen

doi:10.1109/tcyb.2024.3388470

Abstract

Approximation biases of value functions are considered a key problem in reinforcement learning (RL). In particular, existing RL algorithms are hindered by overestimation and underestimation biases, i.e., value mismatching between RL's actual returns and action-value approximations limits the performance of RL algorithms. In this article, we first develop a new synthesis loss function for RL's action-value estimation integrating a regularization term and a modified "clipped double Q-learning" structure for solving overestimation and underestimation biases. To minimize the differences between action-value estimations and actual returns in RL, we develop a new discrepancy function to determine the type and magnitude of approximation biases. Then, two coefficients embedded in the synthesis loss are automatically tuned by minimizing the discrepancy function during training to minimize approximation biases. We further design a new actor-critic (AC) algorithm, named AC with synthesis loss (ACSL), by integrating the synthesis loss function and an error-controlled mechanism. Experimental results on continuous control tasks illustrate that the proposed ACSL algorithm outperforms other cutting-edge RL methods in many tasks and that the proposed synthesis loss function is easily implemented into other algorithms and significantly reduces approximation biases while improving performance. The proposed method can successfully handle many complex continuous control tasks and can greatly outperform other state-of-the-art algorithms on most tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Actor-Critic With Synthesis Loss for Solving Approximation Biases.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics

Lead the way for us

Journal: IEEE transactions on cybernetics	Publication Date: Sep 1, 2024
Citations: 1

Similar Papers

Biped dynamic walking using reinforcement learning
Hamid Benbrahim ... Judy A Franklin
Robotics and Autonomous Systems | VOL. 22
Hamid Benbrahim, et. al.Hamid Benbrahim ... Judy A Franklin
01 Dec 1997
Robotics and Autonomous Systems | VOL. 22

Reward-Punishment Actor-Critic Algorithm Applying to Robotic Non-grasping Manipulation
Taisuke Kobayashi ... Gordon Cheng
-
Taisuke Kobayashi, et. al.Taisuke Kobayashi ... Gordon Cheng
01 Aug 2019
01 Aug 2019

Safe Reinforcement Learning using Data-Driven Predictive Control
Mahmoud Selim ... M Watheq El-Kharashi
-
Mahmoud Selim, et. al.Mahmoud Selim ... M Watheq El-Kharashi
27 Dec 2022
27 Dec 2022

Controlling underestimation bias in reinforcement learning via minmax operation
Fanghui Huang ... Wen Jiang
Chinese Journal of Aeronautics | VOL. -
Fanghui Huang, et. al.Fanghui Huang ... Wen Jiang
01 Mar 2024
Chinese Journal of Aeronautics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Actor-Critic With Synthesis Loss for Solving Approximation Biases.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics