Improved Simultaneous Perturbation Stochastic Approximation and Its Application in Reinforcement Learning

Xiumei Yue

doi:10.1109/csse.2008.1019

Abstract

In the optimization problem which only measurements of the objective function are available, it is difficult or impossible to directly obtain the gradient of the objective function. Although the second order simultaneous perturbation stochastic approximation (2SPSA) algorithm solves this problem successfully by efficient gradient approximation that relies on measurements of the objective function, the accuracy of the algorithm depends on the matrix conditioning of the objective function Hessian. In order to eliminate the influence caused by the objective function Hessian, this paper uses nonlinear conjugate gradient method to decide the search direction of the objective function. By synthesizing different nonlinear conjugate gradient methods, it ensures each search direction to be descensive. Besides the search direction improvement, this paper also uses inexact line searches to decide the stepsize of movement. With the descensive search direction and appropriate stepsize, the improved SPSA converges faster than the 2SPSA. Through applying to reinforcement learning, the virtues of the improved SPSA are validated.

Full Text