Controlling underestimation bias in reinforcement learning via minmax operation

Fanghui Huang,Yixin He,Yu Zhang,Xinyang Deng,Wen Jiang

doi:10.1016/j.cja.2024.03.008

Fanghui Huang, Yixin He + Show 3 more

Open Access

https://doi.org/10.1016/j.cja.2024.03.008

Copy DOI

Export

Save

Cite

Journal: Chinese Journal of Aeronautics	Publication Date: Mar 1, 2024
Citations: 1	License type: cc-by-nc-nd

Abstract
Full-Text
Similar Papers

Abstract

Listen

Obtaining the accurate value estimation and reducing the estimation bias are the key issues in reinforcement learning. However, current methods that address the overestimation problem tend to introduce underestimation, which face a challenge of precise decision-making in many fields. To address this issue, we conduct a theoretical analysis of the underestimation bias and propose the minmax operation, which allow for flexible control of the estimation bias. Specifically, we select the maximum value of each action from multiple parallel state-action networks to create a new state-action value sequence. Then, a minimum value is selected to obtain more accurate value estimations. Moreover, based on the minmax operation, we propose two novel algorithms by combining Deep Q-Network (DQN) and Double DQN (DDQN), named minmax-DQN and minmax-DDQN. Meanwhile, we conduct theoretical analyses of the estimation bias and variance caused by our proposed minmax operation, which show that this operation significantly improves both underestimation and overestimation biases and leads to the unbiased estimation. Furthermore, the variance is also reduced, which is helpful to improve the network training stability. Finally, we conduct numerous comparative experiments in various environments, which empirically demonstrate the superiority of our method.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Controlling underestimation bias in reinforcement learning via minmax operation

Abstract

Published Version

Talk to us

Similar Papers

More From: Chinese Journal of Aeronautics

Lead the way for us

Similar Papers

Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks
Haobo Jiang ... Jin Xie
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Haobo Jiang, et. al.Haobo Jiang ... Jin Xie
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks.
Haobo Jiang ... Guangyu Li
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35
Haobo Jiang, et. al.Haobo Jiang ... Guangyu Li
01 Apr 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35

Actor-Critic With Synthesis Loss for Solving Approximation Biases.
Bo-Wen Guo ... Qiang Shen
IEEE transactions on cybernetics | VOL. 54
Bo-Wen Guo, et. al.Bo-Wen Guo ... Qiang Shen
01 Sep 2024
IEEE transactions on cybernetics | VOL. 54

Multisource Transfer Double DQN Based on Actor Learning.
Jie Pan ... Xuesong Wang
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29
Jie Pan, et. al.Jie Pan ... Xuesong Wang
01 Jun 2018
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Controlling underestimation bias in reinforcement learning via minmax operation

Abstract

Published Version

Talk to us

Similar Papers

More From: Chinese Journal of Aeronautics