Addressing maximization bias in reinforcement learning with two-sample testing

Martin Waltz,Ostap Okhrin

doi:10.1016/j.artint.2024.104204

Martin Waltz, Ostap Okhrin

Open Access

https://doi.org/10.1016/j.artint.2024.104204

Copy DOI

Export

Save

Cite

Journal: Artificial Intelligence	Publication Date: Aug 16, 2024
License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

Value-based reinforcement-learning algorithms have shown strong results in games, robotics, and other real-world applications. Overestimation bias is a known threat to those algorithms and can sometimes lead to dramatic performance decreases or even complete algorithmic failure. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of random variables. We propose the T-Estimator (TE) based on two-sample testing for the mean, that flexibly interpolates between over- and underestimation by adjusting the significance level of the underlying hypothesis tests. We also introduce a generalization, termed K-Estimator (KE), that obeys the same bias and variance bounds as the TE and relies on a nearly arbitrary kernel function. We introduce modifications of Q-Learning and the Bootstrapped Deep Q-Network (BDQN) using the TE and the KE, and prove convergence in the tabular setting. Furthermore, we propose an adaptive variant of the TE-based BDQN that dynamically adjusts the significance level to minimize the absolute estimation bias. All proposed estimators and algorithms are thoroughly tested and validated on diverse tasks and environments, illustrating the bias control and performance potential of the TE and KE.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Addressing maximization bias in reinforcement learning with two-sample testing

Abstract

Published Version

Talk to us

Similar Papers

More From: Artificial Intelligence

Lead the way for us

Similar Papers

Controlling underestimation bias in reinforcement learning via minmax operation
Fanghui Huang ... Wen Jiang
Chinese Journal of Aeronautics | VOL. -
Fanghui Huang, et. al.Fanghui Huang ... Wen Jiang
01 Mar 2024
Chinese Journal of Aeronautics | VOL. -

The computational roots of positivity and confirmation biases in reinforcement learning.
Stefano Palminteri ... Maël Lebreton
Trends in Cognitive Sciences | VOL. 26
Stefano Palminteri, et. al.Stefano Palminteri ... Maël Lebreton
01 Jul 2022
Trends in Cognitive Sciences | VOL. 26

UAV Dynamic Object Tracking with Lightweight Deep Vision Reinforcement Learning
Hy Nguyen ... Hung Du
Algorithms | VOL. 16
Hy Nguyen, et. al.Hy Nguyen ... Hung Du
27 Apr 2023
Algorithms | VOL. 16

Actor-Critic With Synthesis Loss for Solving Approximation Biases.
Bo-Wen Guo ... Qiang Shen
IEEE transactions on cybernetics | VOL. 54
Bo-Wen Guo, et. al.Bo-Wen Guo ... Qiang Shen
01 Sep 2024
IEEE transactions on cybernetics | VOL. 54

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Addressing maximization bias in reinforcement learning with two-sample testing

Abstract

Published Version

Talk to us

Similar Papers

More From: Artificial Intelligence