Abstract

o We present a value iteration algorithm for learn- ing to act in Partially Observable Markov Decision Processes (POMDPs) with continuous state spaces. Mainstream POMDP research focuses on the discrete case and this complicates its application to, e.g., robotic problems that are naturally modeled using continuous state spaces. The main difculty in dening a (belief-based) POMDP in a continuous state space is that expected values over states must be dened using integrals that, in general, cannot be computed in closed from. In this paper, we provide three main contributions to the literature on continuous- state POMDPs. First, we show that the optimal nite-horizon value function over the continuous innite-dimension al POMDP belief space is piecewise linear and convex, and is dened by a nite set of supporting -functions that are analogous to the -vectors (hyperplanes) dening the value function of a discrete- state POMDP. Second, we show that, for a fairly general class of POMDP models in which all functions of interest are modeled by Gaussian mixtures, all belief updates and value iteration backups can be carried out analytically and exact. Contrary to the discrete case, in a continuous-state POMDP the -functions may grow in size (e.g., in the number of Gaussian components) in each value iteration. Third, we show how the recent point-based value iteration algorithms for discrete POMDPs can be extended to the continuous case, allowing for efcient planning in practical problems. In particular, we demonstrate Perseus, our previously proposed randomized point-based value iteration algorithm, in a simple robot planning problem in a continuous domain, where encouraging results are observed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call