Bellman residuals minimization using online support vector machines

Gennaro Esposito,Mario Martin

doi:10.1007/s10489-017-0910-7

Abstract

In this paper we present and theoretically study an Approximate Policy Iteration (API) method called A P I − B R M 𝜖 using a very effective implementation of incremental Support Vector Regression (SVR) to approximate the value function able to generalize Reinforcement Learning (RL) problems with continuous (or large) state space. A P I − B R M 𝜖 is presented as a non-parametric regularization method based on an outcome of the Bellman Residual Minimization (BRM) able to minimize the variance of the problem. The proposed method can be cast as incremental and may be applied to the on-line agent interaction framework of RL. Being also based on SVR which are based on convex optimization, is able to find the global solution of the problem. A P I − B R M 𝜖 using SVR can be seen as a regularization problem using 𝜖−insensitive loss. Compared to standard squared loss also used in regularization, this allows to naturally build a sparse solution for the approximation function. We extensively analyze the statistical properties of A P I − B R M 𝜖 founding a bound which controls the performance loss of the algorithm under some assumptions on the kernel and assuming that the collected samples are not-i.i.d. following a β−mixing process. Some experimental evidence and performance for well known RL benchmarks are also presented.

Full Text