Abstract
Cao's work shows that, by defining an α-dependent equivalent infinitesimal generator A α, a semi-Markov decision process (SMDP) with both average- and discounted-cost criteria can be treated as an α-equivalent Markov decision process (MDP), and the performance potential theory can also be developed for SMDPs. In this work, we focus on establishing error bounds for potential and A α-based iterative optimization methods. First, we introduce an α-uniformized Markov chain (UMC) for a SMDP via A α and a uniformized parameter, and show their relations. Especially, we obtain that their performance potentials, as solutions of corresponding Poisson equations, are proportional, so that the studies of a SMDP and the α-UMC based on potentials are unified. Using these relations, we derive the error bounds for a potential-based policy-iteration algorithm and a value-iteration algorithm, respectively, when there exist various calculation errors. The obtained results can be applied directly to the special models, i.e., continuous-time MDPs and Markov chains, and can be extended to some simulation-based optimization methods such as reinforcement learning and neuro-dynamic programming, where estimation errors or approximation errors are common cases. Finally, we give an application example on the look-ahead control of a conveyor-serviced production station (CSPS), and show the corresponding error bounds.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.