Abstract

Many of simulation based learning algorithms have been developed to obtain near optimal policies for Markov decision processes (MDPs) with large state space. However, most of them are for unichain problems. In view that some applications involve multichain processes and it is NP-hard to determine whether a MDP is unichain or not, it is desirable to obtain an algorithm that is applicable to multichain problems as well. This paper presents a rollout algorithm for multichain MDPs with average cost. Preliminary analysis of the estimation error and parameter settings are provided based on the problem structures, i.e., mixing time of transition matrix. Ordinal optimization and Optimal Computing Budget Allocation are also suggested to improve the efficiency of the algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.