Abstract

This paper deals with computation of an optimal policy for Markov decision processes involving continuous movement as well as switching costs. Recently the author derived conditions for the optimality of a policy in such decision processes. These conditions can be used to verify the optimality of a given stationary policy but cannot be used to obtain one directly. Some computational procedure is needed to arrive at a stationary optimal policy. In this paper we develop an algorithm which generates a sequence of successively improving policies converging (at least along a subsequence) to an optimal stationary policy. Two special cases are considered. The first one is a general continuous-time Markov decision process with a countable state space. In this case the sufficient conditions for optimality suggest an algorithm procedure. It is shown that this algorithm either terminates at a stationary optimal policy or converges to one (at least along a subsequence). The second special case is the case of controlled one dimensional diffusion process. In this case the simple algorithm suggested by the sufficient conditions does give a sequence of successively improving policies. However, this may terminate at or converge to a suboptimal policy. An additional step in the algorithm is proposed. It is shown that this modified algorithm does work. That is, it either terminates at a stationary optimal policy or converges to one along a subsequence. Similarly modified algorithms can be developed for the Markov decision processes in which the underlying process is compound Poisson with a drift. Such processes frequently occur in controlled queues and inventory systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call