Abstract
This paper studies the constrained (nonhomogeneous) continuous-time Markov decision processes on the finite horizon. The performance criterion to be optimized is the expected total reward on the finite horizon, while N constraints are imposed on similar expected costs. Introducing the appropriate notion of the occupation measures for the concerned optimal control problem, we establish the following under some suitable conditions: (a) the class of Markov policies is sufficient; (b) every extreme point of the space of performance vectors is generated by a deterministic Markov policy; and (c) there exists an optimal Markov policy, which is a mixture of no more than $$N+1$$N+1 deterministic Markov policies.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.