Abstract
This paper establishes the existence of an optimal stationary strategy in a leavable Markov decision process with countable state space and undiscounted total reward criterion.Besides assumptions of boundedness and continuity, an assumption is imposed on the model which demands the continuity of the mean recurrence times on a subset of the stationary strategies, the so-called ‘good strategies'. For practical applications it is important that this assumption is implied by an assumption about the cost structure and the transition probabilities. In the last part we point out that our results in general cannot be deduced from related works on bias-optimality by Dekker and Hordijk, Wijngaard or Mann.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.