Actor-Critic Algorithms with Online Feature Adaptation

K J Prabuchandran,Vivek S Borkar,Shalabh Bhatnagar

doi:10.1145/2868723

Abstract

We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other works for discounted cost MDPs. Our actor-critic architecture incorporates parameterization both in the policy and the value function. A gradient search in the policy parameters is performed to improve the performance of the actor. The computation of the aforementioned gradient, however, requires an estimate of the value function of the policy corresponding to the current actor parameter. The value function, on the other hand, is approximated using linear function approximation and obtained from the critic. The error in approximation of the value function, however, results in suboptimal policies. In our article, we also update the features by performing a gradient descent on the Grassmannian of features to minimize a mean square Bellman error objective in order to find the best features. The aim is to obtain a good approximation of the value function and thereby ensure convergence of the actor to locally optimal policies. In order to estimate the gradient of the objective in the case of the average cost criterion, we utilize the policy gradient theorem, while in the case of the discounted cost objective, we utilize the simultaneous perturbation stochastic approximation (SPSA) scheme. We prove that our actor-critic algorithms converge to locally optimal policies. Experiments on two different settings show performance improvements resulting from our feature adaptation scheme.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Actor-Critic Algorithms with Online Feature Adaptation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Modeling and Computer Simulation

Lead the way for us

Journal: ACM Transactions on Modeling and Computer Simulation	Publication Date: Feb 9, 2016
Citations: 9

Similar Papers

An actor critic algorithm based on Grassmanian search
Prabuchandran K.J ... Vivek S Borkar
-
Prabuchandran K.J, et. al. Prabuchandran K.J ... Vivek S Borkar
01 Dec 2014
01 Dec 2014

Piecewise linear value function approximation for factored MDPs
...
-
, et. al. ...
28 Jul 2002
28 Jul 2002

Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes
L Chandrashekar ... Shalabh Bhatnagar
-
L Chandrashekar, et. al.L Chandrashekar ... Shalabh Bhatnagar
01 Dec 2014
01 Dec 2014

Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies
Milos S Stankovic ... Srdjan S Stankovic
-
Milos S Stankovic, et. al.Milos S Stankovic ... Srdjan S Stankovic
01 Jul 2016
01 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Actor-Critic Algorithms with Online Feature Adaptation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Modeling and Computer Simulation