Abstract
We discuss the problems of discounted-cost performance optimization for a class of semi-Markov decision processes (SMDPs). We define a matrix which can be used as the infinitesimal generator of a Markov process. The discounted Poisson equation is proposed for an SMDP by using this matrix, from which the α-potential is defined. The optimality equation satisfied by the optimal stationary policy is given and the relation between discounted model and average model is discussed. Two iteration algorithms to find e-optimal policies are proposed and the proofs of convergence of these two algorithms are given. A numerical example is provided to illustrate the application of the algorithms.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have