Abstract

The classical theory of reinforcement learning focused on the tabular setting when states and actions are finite, or for linear representation of the value function in a finite-dimensional approximation. Establishing theory on general continuous state and action space requires a careful treatment of complexity theory of appropriately chosen function spaces and the iterative update of the value function when stochastic gradient descent (SGD) is used. For the classical prediction problem in reinforcement learning based on i.i.d. streaming data in the framework of reproducing kernel Hilbert spaces, we establish polynomial sample complexity taking into account the smoothness of the value function. In particular, we prove that the gradient descent algorithm efficiently computes the value function with appropriately chosen step sizes, with a convergence rate that can be close to 1/N, which is the best possible rate for parametric SGD. The advantages of using the gradient descent algorithm include its computational convenience and it can naturally deal with streaming data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call