Abstract

Performance optimization is considered for average-cost multichain Markov decision processes (MDPs) with compact action set. Since, for a general compact multichain model, the optimality equation system may have no solution, and also a policy iteration algorithm may yield a suboptimal policy rather than an optimal one, we concentrate only on a special case of multichain models in this paper, where we assume that the classifications of states are fixed identically rather than varying with policies. By using the concept of performance potentials, the existence of solutions to the optimality equation system is established, and then a potential-based policy iteration algorithm is supposed to solve this system. In addition, the optimality convergence, for recurrent classes, of the algorithm has been proved. Finally, a numerical example is provided.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.