Abstract
A singularly perturbed Markov decision process with the limiting average reward criterion is considered. It is assumed that the underlying process is composed of n separate irreducible processes, and that the small perturbation is such that it unites these processes into a single irreducible process. Two algorithms for the solution of the underlying limit Markov control problem are presented. The first of these is a linear program possessing the Wolfe-Dantzig structure inherited from the ergodic 'nearly decomposable' assumption in the model. The second is an aggregation-disaggregation policy improvement algorithm. >
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.