Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Naoyuki Ichihara

doi:10.1007/s00245-020-09707-x

Abstract

This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value $$\lambda _*$$ , the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if $$\lambda _*<0$$ , then the value function converges to a solution to the corresponding stationary equation; (ii) if $$\lambda _*>0$$ , then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if $$\lambda _*=0$$ , then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Abstract

Talk to us

Similar Papers

More From: Applied Mathematics & Optimization

Lead the way for us

Journal: Applied Mathematics & Optimization	Publication Date: Aug 4, 2020
Citations: 1

Similar Papers

Process control using finite Markov chains with iterative clustering
Enso Ikonen ... István Selek
Computers & Chemical Engineering | VOL. 93
Enso Ikonen, et. al.Enso Ikonen ... István Selek
01 Jul 2016
Computers & Chemical Engineering | VOL. 93

Piecewise Linear Approximations for Partially Observable Markov Decision Processes with Finite Horizons
Douglas J White
Journal of Information and Optimization Sciences | VOL. 13
Douglas J WhiteDouglas J White
01 May 1992
Journal of Information and Optimization Sciences | VOL. 13

Aggregation-disaggregation algorithm for epsilon /sup 2/-singularly perturbed limiting average Markov control problems
M Abbad ... J.A Filar
-
M Abbad, et. al.M Abbad ... J.A Filar
11 Dec 1991
11 Dec 1991

Robust Capacity Control Choice Based Behavior
Lianjun Song
-
Lianjun SongLianjun Song
29 Jul 2009
29 Jul 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Abstract

Talk to us

Similar Papers

More From: Applied Mathematics &amp; Optimization

More From: Applied Mathematics & Optimization