Abstract
Although Q-learning has achieved remarkable success in some practical cases, it often suffers from the overestimation problem in stochastic environments, which is commonly viewed as a shortcoming of Q-learning. Overestimated values are introduced by estimations on the next state Q value, which is well-known as the maximization bias. In this paper, we propose a more accurate method for estimating the Q value by the Q value decomposition and re-evaluation with similar samples based on linear function approximation. Specifically, we reform the parameterized incremental update formula of Q-learning and also demonstrate that the new formula is equivalent to the original one. Moreover, we propose a new parameterized incremental update formula of Q-learning to address the overestimation problem and present the more accurate computing method, which can be used in problems with continuous state spaces and stochastic environments. Experimentally, when compared with Doubly Bounded Q-learning and other Q-learning based methods, the new algorithm has more than 31% improvement of performance in Mountain Car and Cart Pole. Furthermore, the algorithm is robust to the learning rate and its memory capacity. Finally, the practical applicability of our algorithm is discussed through an analysis of time consumption.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have