Algorithms for partially observable markov decision processes

Hsien-Te Cheng ,Shelby L Brumelle

doi:10.14288/1.0098252

Abstract

The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes. For the infinite horizon problem, only discounted reward case is considered. For the finite horizon problem, two new algorithms are developed. The first algorithm is called the relaxed region algorithm. For each support in the value function, this algorithm determines a region not smaller than its support region and modifies it implicitly in later steps until the exact support region is found. The second algorithm, called linear support algorithm, systematically approximates the value function until all supports in the value function are found. The most important feature of this algorithm is that it can be modified to find an approximate value function. It has been shown that these two algorithms are more efficient than the one-pass algorithm. For the infinite horizon problem, it is first shown that the approximation version of linear support algorithm can be used to substitute the policy improvement step in a standard successive approximation method to obtain an $\epsilon$-optimal value function. Next, an iterative discretization procedure is developed which uses a small number of states to find new supports and improve the value function between two policy improvement steps. Since only a finite number of states are chosen in this process, some techniques developed for finite MDP can be applied here. Finally, we prove that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm. The last part of the thesis deals with problems with continuous signals. We first show that if the signal processes are uniformly distributed, then the problem can be reformulated as a problem with finite number of signals. Then the result is extended to where the signal processes are step functions. Since step functions can be easily used to approximate most of the probability distributions, this method can be used to approximate most of the problems with continuous signals. Finally, we present some conditions which guarantee that the linear support can be computed for any given state, then the methods developed for finite signal cases can be easily modified and applied to problems for which the conditions hold.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Algorithms for partially observable markov decision processes

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Approximate Dynamic Programming
Warren B Powell
-
Warren B PowellWarren B Powell
04 Aug 2011
04 Aug 2011

Exploiting submodular value functions for scaling up active perception
Yash Satsangi ... Frans A Oliehoek
Autonomous Robots | VOL. 42
Yash Satsangi, et. al.Yash Satsangi ... Frans A Oliehoek
29 Aug 2017
Autonomous Robots | VOL. 42

Risk-sensitive planning in partially observable environments
...
-
, et. al. ...
10 May 2010
10 May 2010

Optimal Controller Synthesis for Nonlinear Systems

-

01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Algorithms for partially observable markov decision processes

Abstract

Talk to us

Similar Papers