Modeling Bellman-error with logistic distribution with applications in reinforcement learning

Outongyi Lv,Outongyi Lv,Bingxin Zhou,Lin F Yang

doi:10.1016/j.neunet.2024.106387

Abstract

In modern Reinforcement Learning (RL) approaches, optimizing the Bellman error is a critical element across various algorithms, notably in deep Q-Learning and related methodologies. Traditional approaches predominantly employ the mean-squared Bellman error (MSELoss) as the standard loss function. However, the assumption of Bellman errors following the Gaussian distribution may oversimplify the nuanced characteristics of RL applications. In this work, we revisit the distribution of Bellman error in RL training, demonstrating that it tends to follow the Logistic distribution rather than the commonly assumed Normal distribution. We propose replacing MSELoss with a Logistic maximum likelihood function (LLoss) and rigorously test this hypothesis through extensive numerical experiments across diverse online and offline RL environments. Our findings consistently show that integrating the Logistic correction into the loss functions of various baseline RL methods leads to superior performance compared to their MSE counterparts. Additionally, we employ Kolmogorov–Smirnov tests to substantiate that the Logistic distribution offers a more accurate fit for approximating Bellman errors. This study also offers a novel theoretical contribution by establishing a clear connection between the distribution of Bellman error and the practice of proportional reward scaling, a common technique for performance enhancement in RL. Moreover, we explore the sample-accuracy trade-off involved in approximating the Logistic distribution, leveraging the Bias–Variance decomposition to mitigate excessive computational resources. The theoretical and empirical insights presented in this study lay a significant foundation for future research, potentially advancing methodologies, and understanding in RL, particularly in the distribution-based optimization of Bellman error.

Full Text