Markov decision processes with a stopping time constraint

Masayuki Horiguchi

doi:10.1007/pl00003996

Abstract

In this paper, the optimization problem for a stopped Markov decision process with finite states and actions is considered over stopping times τ constrained so that ?τ≦α for some fixed α>0. The problem is solved through randomization of stopping times and mathematical programming formulation by occupation measures. Another representation, called F-representation, of randomized stopping times is given, by which the concept of Markov or stationary randomized stopping times is introduced. We treat two types of occupation measures, running and stopped, but stopped occupation measure is shown to be expressed by running one. We study the properties of the set of running occupation measures achieved by different classes of pairs of policies and randomized stopping times. Analyzing the equivalent mathematical programming problem formulated by running occupation measures corresponding with stationary policies and stationary randomized stopping times, we prove the existence of an optimal constrained pair of stationary policy and stopping time requiring randomization in at most one state.

Full Text