Abstract

This paper proposes a value iteration method which finds ane-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy ise-optimal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call