Abstract

Markov decision processes are often specified with limited knowledge of the real behavior or are part of a partially unknown environment such that transition rates and rewards are not exactly known. Different models to describe this uncertainty in a formal way have been proposed. In all cases it is important to consider uncertainty during the computation of optimal control policies. Usually this is done by computing robust solutions which are optimal in the worst realization of uncertainty. However, such solutions tend to be very conservative.In this paper, we develop an approach to mitigate robustness by computing policies that are optimal in a predefined situation, like the average case, but also guarantee a minimal gain in all other situations, including the worst case. We present algorithms based on policy iteration that solve subproblems using Mixed Integer Linear Programming (MILP) or Nonlinear Programming (NLP).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call