An actor critic algorithm based on Grassmanian search

Prabuchandran K.j Prabuchandran K.J,Shalabh Bhatnagar,Vivek S Borkar

doi:10.1109/cdc.2014.7039948

Abstract

We propose the first online actor-critic scheme with adaptive basis to find a local optimal control policy for a Markov Decision Process (MDP) under the weighted discounted cost objective. We parameterize both the policy in the actor and the value function in the critic. The actor performs gradient search in the space of policy parameters using simultaneous perturbation stochastic approximation (SPSA) gradient estimates. This gradient computation requires estimates of value function that are provided by the critic by minimizing a mean square Bellman error objective. In order to obtain good estimates of the value function, the critic adaptively tunes the basis functions (or the features) to obtain the best representation of the value function using gradient search in the Grassmanian of features. Our control algorithm makes use of multi-timescale stochastic approximation. The actor updates its parameters along the slowest time scale. The critic uses two time scales to estimate the value function. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme on the faster timescale and, on a medium timescale, performs gradient search in the Grassman manifold of features. We provide an outline of the proof of convergence of our control algorithm to a locally optimum policy. We show empirical results using our algorithm as well as a similar algorithm that uses temporal difference (TD) learning in place of the residual gradient scheme for the faster timescale updates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An actor critic algorithm based on Grassmanian search

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Actor-Critic Algorithms with Online Feature Adaptation
K J Prabuchandran ... Shalabh Bhatnagar
ACM Transactions on Modeling and Computer Simulation | VOL. 26
K J Prabuchandran, et. al.K J Prabuchandran ... Shalabh Bhatnagar
09 Feb 2016
ACM Transactions on Modeling and Computer Simulation | VOL. 26

Feature Search in the Grassmanian in Online Reinforcement Learning
Shalabh Bhatnagar ... Prabuchandran K J
IEEE Journal of Selected Topics in Signal Processing | VOL. 7
Shalabh Bhatnagar, et. al.Shalabh Bhatnagar ... Prabuchandran K J
01 Oct 2013
IEEE Journal of Selected Topics in Signal Processing | VOL. 7

Model-Free Indirect RL: Temporal Difference
Shengbo Eben Li
-
Shengbo Eben LiShengbo Eben Li
01 Jan 2023
01 Jan 2023

Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
Mohammed Shahid Abdulla ... Shalabh Bhatnagar
Discrete Event Dynamic Systems | VOL. 17
Mohammed Shahid Abdulla, et. al.Mohammed Shahid Abdulla ... Shalabh Bhatnagar
04 Jan 2007
Discrete Event Dynamic Systems | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An actor critic algorithm based on Grassmanian search

Abstract

Talk to us

Similar Papers