Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control

Ari Arapostathis

doi:10.1287/stsy.2019.0040

Abstract

The relative value iteration scheme (RVI) for Markov decision processes (MDP) dates back to White (1963) , a seminal work, which introduced an algorithm for solving the ergodic dynamic programming equation for the finite state, finite action case. Its ramifications have given rise to popular learning algorithms (Q-learning). More recently, this algorithm gained prominence because of its implications for model predictive control (MPC). For stochastic control problems on an infinite time horizon, especially for problems that seek to optimize the average performance (ergodic control), obtaining the optimal policy in explicit form is only possible for a few classes of well-structured models. What is often used in practice is a heuristic method called the rolling horizon, or receding horizon, or MPC. This works as follows: one solves the finite horizon problem for a given number of steps N, or for an interval [0,T] in the case of a continuous time problem. The result is a nonstationary Markov policy, which is optimal for the finite horizon problem. We fix the initial action (this is the action determined at the Nth step of the value iteration (VI) algorithm) and apply it as a stationary Markov control. We refer to this Markov control as the rolling horizon control. This of course depends on the length of the horizon N. One expects that for well-structured problems, if N is sufficiently large, then the rolling horizon control is near optimal. Of course, this is a heuristic. The rolling horizon control might not even be stable. For a good discussion on this problem, we refer the reader to Della Vecchia et al. (2012) . Obtaining such solutions is further complicated by the fact that the value of the ergodic cost required in the successive iteration scheme is not known. This is the reason for the RVI.

Highlights

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement
Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service
History: This paper was accepted for the Stochastic Systems Special Section on Open Problems in Applied Probability, presented at the 2018 INFORMS Annual Meeting in Phoenix, Arizona, November 4–7, 2018

Summary

Introduction

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control To cite this article: Ari Arapostathis (2019) Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Stochastic Systems	Publication Date: Sep 1, 2019
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Stochastic Systems

Lead the way for us

Similar Papers

Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs
Eugenio Della Vecchia ... Alain Jean-Marie
Annals of Operations Research | VOL. 199
Eugenio Della Vecchia, et. al.Eugenio Della Vecchia ... Alain Jean-Marie
02 Feb 2012
Annals of Operations Research | VOL. 199

Markov Decision Process Parallel Value Iteration Algorithm On GPU
Lu Lu ... Peng Chen
-
Lu Lu, et. al.Lu Lu ... Peng Chen
01 Jan 2013
01 Jan 2013

Action Time Sharing Policies for Ergodic Control of Markov Chains
Amarjit Budhiraja ... Xin Liu
SIAM Journal on Control and Optimization | VOL. 50
Amarjit Budhiraja, et. al.Amarjit Budhiraja ... Xin Liu
01 Jan 2012
SIAM Journal on Control and Optimization | VOL. 50

Markov Decision Processes with Exogenous Variables
Robert Bray
SSRN Electronic Journal | VOL. -
Robert BrayRobert Bray
01 Jan 2018
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Stochastic Systems