Policy improvement algorithm for continuous time Markov decision processes with switching costs

Bharat Doshi

doi:10.1007/bfb0009393

Abstract

This paper deals with computation of an optimal policy for Markov decision processes involving continuous movement as well as switching costs. Recently the author derived conditions for the optimality of a policy in such decision processes. These conditions can be used to verify the optimality of a given stationary policy but cannot be used to obtain one directly. Some computational procedure is needed to arrive at a stationary optimal policy. In this paper we develop an algorithm which generates a sequence of successively improving policies converging (at least along a subsequence) to an optimal stationary policy. Two special cases are considered. The first one is a general continuous-time Markov decision process with a countable state space. In this case the sufficient conditions for optimality suggest an algorithm procedure. It is shown that this algorithm either terminates at a stationary optimal policy or converges to one (at least along a subsequence). The second special case is the case of controlled one dimensional diffusion process. In this case the simple algorithm suggested by the sufficient conditions does give a sequence of successively improving policies. However, this may terminate at or converge to a suboptimal policy. An additional step in the algorithm is proposed. It is shown that this modified algorithm does work. That is, it either terminates at a stationary optimal policy or converges to one along a subsequence. Similarly modified algorithms can be developed for the Markov decision processes in which the underlying process is compound Poisson with a drift. Such processes frequently occur in controlled queues and inventory systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Policy improvement algorithm for continuous time Markov decision processes with switching costs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion
M Yu Kitayev
Theory of Probability & Its Applications | VOL. 30
M Yu KitayevM Yu Kitayev
01 Jun 1986
Theory of Probability & Its Applications | VOL. 30

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Equivalence of Lyapunov stability criteria in a class of Markov decision processes
Rolando Cavazos-Cadena ... On�simo Hern�ndez-Lerma
Applied Mathematics & Optimization | VOL. 26
Rolando Cavazos-Cadena, et. al.Rolando Cavazos-Cadena ... On�simo Hern�ndez-Lerma
01 Sep 1992
Applied Mathematics & Optimization | VOL. 26

Asymptotic Optimality and Rates of Convergence of Quantized Stationary Policies in Continuous‐Time Markov Decision Processes
Xiao Wu ... Yanqiu Tang
Discrete Dynamics in Nature and Society | VOL. 2022
Xiao Wu, et. al.Xiao Wu ... Yanqiu Tang
01 Jan 2021
Discrete Dynamics in Nature and Society | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Policy improvement algorithm for continuous time Markov decision processes with switching costs

Abstract

Talk to us

Similar Papers