Minibatch Recursive Least Squares Q-Learning.

Chunyuan Zhang,Qi Song,Zeng Meng

doi:10.1155/2021/5370281

Abstract

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent's states rather than the agent's state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Highlights

Reinforcement learning (RL) is an important machine learning methodology for solving sequential decisionmaking problems
(4) In order to alleviate the feature change of the same state and integrate minibatch recursive least squares Q-learning (MRLS-Q) into deep Q-network (DQN), we present a new method to define the feature function of MRLS-Q. (5) We demonstrate the effectiveness of MRLS-Q, alone and as the last layer of DQN, by using the CartPole problem and four Atari games, respectively
We propose MRLS-Q, a linear recursive least squares (RLS) function approximation algorithm with the similar learning mechanism to DQN

Summary

Introduction

Reinforcement learning (RL) is an important machine learning methodology for solving sequential decisionmaking problems. In traditional RL, Q-learning algorithms often use linear functions to approximate action values, which have better stability and fewer hyperparameters to be trained than DQNs [20]. From the DQN’s learning mechanism, a perfect integrated LS-type algorithm should be able to use the inputs of the DQN’s last layer for approximating action values and should have the same learning mode as DQN. Ey seem to meet the above requirements to some extent, they are difficult to integrate with DQNs. ey use the same experience replay and minibatch learning mode as DQN They can avoid computing the matrix inverse and are more suitable for online learning by using recursive least squares (RLS). Based on this work and inspired by the work of Levine et al, we propose a novel minibatch RLS Q-learning algorithm with linear function approximation, called the MRLS-Q.

Background

The Proposed Algorithm

Experiments

Conclusion