Abstract

In this paper, we provide an analysis of the convergence characteristics of the dynamic programming value iteration method based on stability-theoretical results of discrete-time switched affine systems. For interval Markov decision processes subject to multiple objective functions, we reformulate the value iteration algorithm as a switched affine system with additive uncertainties. Building on this change of perspective, we adopt a Lyapunov-based strategy for designing a control policy by finding a switching law that stabilizes the system towards an invariant set of attraction centered at a desired target value. The results provide insights into the convergence of the value iteration algorithm and the feasibility of desired target values. The results are demonstrated on two case studies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.