Abstract

We propose the first online actor-critic scheme with adaptive basis to find a local optimal control policy for a Markov Decision Process (MDP) under the weighted discounted cost objective. We parameterize both the policy in the actor and the value function in the critic. The actor performs gradient search in the space of policy parameters using simultaneous perturbation stochastic approximation (SPSA) gradient estimates. This gradient computation requires estimates of value function that are provided by the critic by minimizing a mean square Bellman error objective. In order to obtain good estimates of the value function, the critic adaptively tunes the basis functions (or the features) to obtain the best representation of the value function using gradient search in the Grassmanian of features. Our control algorithm makes use of multi-timescale stochastic approximation. The actor updates its parameters along the slowest time scale. The critic uses two time scales to estimate the value function. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme on the faster timescale and, on a medium timescale, performs gradient search in the Grassman manifold of features. We provide an outline of the proof of convergence of our control algorithm to a locally optimum policy. We show empirical results using our algorithm as well as a similar algorithm that uses temporal difference (TD) learning in place of the residual gradient scheme for the faster timescale updates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.