Abstract
The relative value iteration scheme (RVI) for Markov decision processes (MDP) dates back to White (1963) , a seminal work, which introduced an algorithm for solving the ergodic dynamic programming equation for the finite state, finite action case. Its ramifications have given rise to popular learning algorithms (Q-learning). More recently, this algorithm gained prominence because of its implications for model predictive control (MPC). For stochastic control problems on an infinite time horizon, especially for problems that seek to optimize the average performance (ergodic control), obtaining the optimal policy in explicit form is only possible for a few classes of well-structured models. What is often used in practice is a heuristic method called the rolling horizon, or receding horizon, or MPC. This works as follows: one solves the finite horizon problem for a given number of steps N, or for an interval [0,T] in the case of a continuous time problem. The result is a nonstationary Markov policy, which is optimal for the finite horizon problem. We fix the initial action (this is the action determined at the Nth step of the value iteration (VI) algorithm) and apply it as a stationary Markov control. We refer to this Markov control as the rolling horizon control. This of course depends on the length of the horizon N. One expects that for well-structured problems, if N is sufficiently large, then the rolling horizon control is near optimal. Of course, this is a heuristic. The rolling horizon control might not even be stable. For a good discussion on this problem, we refer the reader to Della Vecchia et al. (2012) . Obtaining such solutions is further complicated by the fact that the value of the ergodic cost required in the successive iteration scheme is not known. This is the reason for the RVI.
Highlights
The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement
Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service
History: This paper was accepted for the Stochastic Systems Special Section on Open Problems in Applied Probability, presented at the 2018 INFORMS Annual Meeting in Phoenix, Arizona, November 4–7, 2018
Summary
The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control To cite this article: Ari Arapostathis (2019) Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.