Abstract

Abstract We consider constrained Markov decision processes (MDP’s) with compact state and action spaces under long-run average reward or cost criteria, and give the characterization of an optimal pair of initial state distribution and policy, which maximize over all policies the essential infimum of the sample-path average reward subject to multiple average cost constraints. First, applying the idea of occupation measures constrained MDP’s considered here are equivalently transfered to the infinite linear programming and the existence of an optimal pair associated with a randomized stationary policy is shown by compact analysis. Secondly, we introduce the notion of a state-wise mixed stationary policy (s.w.m. policy) to get further results. And for any e>0, the existence of an e-optimal pair associated with a s.w.m. policy is inductively proved by using a Lagrange multiplier. In case of countable state space, the existence of an optimal pair associated with a s.w.m. policy is shown.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call