Abstract

This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a feasible policy exists for a problem with M constraints. We prove two results on the existence and structure of optimal policies. First, we show that there exists a randomized stationary optimal policy which requires at most M actions more than a nonrandomized stationary one. This result is known for several particular cases. Second, we prove that there exists an optimal policy which is (i) stationary (nonrandomized) from some step onward, (ii) randomized Markov before this step, but the total number of actions which are added by randomization is at most M, (iii) the total number of actions that are added by nonstationarity is at most M. We also establish Pareto optimality of policies from the two classes described above for multi-criteria problems. We describe an algorithm to compute optimal policies with properties (i)–(iii) for constrained problems. The policies that satisfy properties (i)–(iii) have the pleasing aesthetic property that the amount of randomization they require over any trajectory is restricted by the number of constraints. In contrast, a randomized stationary policy may require an infinite number of randomizations over time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call