Predictive representations for policy gradient in POMDPs

Abdeslam Boularias,Brahim Chaib-Draa

doi:10.1145/1553374.1553383

Abstract

We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive State Representations (PSRs). We compare PSR policies to Finite-State Controllers (FSCs), which are considered as a standard model for policy gradient methods in POMDPs. We present a general Actor-Critic algorithm for learning both FSCs and PSR policies. The critic part computes a value function that has as variables the parameters of the policy. These latter parameters are gradually updated to maximize the value function. We show that the value function is polynomial for both FSCs and PSR policies, with a potentially smaller degree in the case of PSR policies. Therefore, the value function of a PSR policy can have less local optima than the equivalent FSC, and consequently, the gradient algorithm is more likely to converge to a global optimal solution.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predictive representations for policy gradient in POMDPs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Closing the learning-planning loop with predictive state representations
...
-
, et. al. ...
10 May 2010
10 May 2010

Generalized and bounded policy iteration for finitely-nested interactive POMDPs: scaling up
...
-
, et. al. ...
04 Jun 2012
04 Jun 2012

Point-Based Planning for Predictive State Representations
Masoumeh T Izadi ... Doina Precup
-
Masoumeh T Izadi, et. al.Masoumeh T Izadi ... Doina Precup
01 Jan 2008
01 Jan 2008

Generalized Controllers in POMDP Decision-Making
Kyle Hollins Wray ... Shlomo Zilberstein
-
Kyle Hollins Wray, et. al.Kyle Hollins Wray ... Shlomo Zilberstein
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predictive representations for policy gradient in POMDPs

Abstract

Talk to us

Similar Papers