Partially Observable Risk-Sensitive Markov Decision Processes

Nicole Bäauerle,Ulrich Rieder

doi:10.1287/moor.2016.0844

Abstract

We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon that is generated by a partially observable Markov decision process (POMDP). In contrast to a risk-neutral decision maker, this optimization criterion takes the variability of the cost into account. It contains as a special case the classical risk-sensitive optimization criterion with an exponential utility. We show that this optimization problem can be solved by embedding the problem into a completely observable Markov decision process with extended state space and give conditions under which an optimal policy exists. The state space has to be extended by the joint conditional distribution of current unobserved state and accumulated cost. In case of an exponential utility, the problem simplifies considerably and we rediscover what in previous literature has been named information state. However, since we do not use any change of measure techniques here, our approach is simpler. A simple example, namely, a risk-sensitive Bayesian house selling problem, is considered to illustrate our results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Partially Observable Risk-Sensitive Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research

Lead the way for us

Journal: Mathematics of Operations Research	Publication Date: Nov 1, 2017
Citations: 24

Similar Papers

More Risk-Sensitive Markov Decision Processes
Nicole Bäuerle ... Ulrich Rieder
Mathematics of Operations Research | VOL. 39
Nicole Bäuerle, et. al.Nicole Bäuerle ... Ulrich Rieder
01 Feb 2014
Mathematics of Operations Research | VOL. 39

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Convergence and Near Optimality of Q-Learning with Finite Memory for Partially Observed Models
Ali Devran Kara ... Serdar Yuksel
-
Ali Devran Kara, et. al.Ali Devran Kara ... Serdar Yuksel
14 Dec 2021
14 Dec 2021

Optimal inspection and maintenance planning for deteriorating structural components through dynamic Bayesian networks and Markov decision processes
P.G Morato ... P Rigo
Structural Safety | VOL. 94
P.G Morato, et. al.P.G Morato ... P Rigo
30 Oct 2021
Structural Safety | VOL. 94

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Partially Observable Risk-Sensitive Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research