Linear programming considerations on Markovian Decision Processes with no discounting

Shunji Osaki,Hisashi Mine

doi:10.1016/0022-247x(69)90191-7

Shunji Osaki, Hisashi Mine

Open Access

https://doi.org/10.1016/0022-247x(69)90191-7

Copy DOI

Journal: Journal of Mathematical Analysis and Applications	Publication Date: Apr 1, 1969
Citations: 9	License type: elsevier-specific: oa user license

Affiliation: Kyoto University

Abstract

A stochastic sequential control system has been studied as a Markovian Decision Process (M.D.P.) originally discussed by Bellman [l]. In M.D.P.‘s two approaches have been studied. One is a Policy Iteration Algorithm (P.I.A.) originally formulated by Howard [2]. This approach has been extended by Blackwell [3], Veinott [4], and others. Another is a Linear Programming (L.P.) Algorithm originally formulated by Manne [5]. Further, Wolfe and Dantzig [6], Derman [7], D’Epenoux [8], De Ghellinck and Eppen [9], and others have also discussed L.P. approach. It is also known that these two approaches are mutually dual in mathematical programming, i.e., these are equivalent. This fact of duality is only known when M.D.P.‘s are discounting, completely ergodic with no discounting in the sense of [2], or terminating in the sense of [lo]. But no result has been established for a general M.D.P. in which there are some ergodic sets plus a transient set, and these sets may change according to any policy we choose. In this paper we formulate a general M.D.P. with no discounting by an L.P. problem. And we give a procedure to solve this L.P. problem. We further show that P.I.A. is equivalent to this L.P. problem, i.e., P.I.A. is a special structure algorithm of the revised L.P. in which pivot operations for many variables are performed simultaneously. An example is presented to understand this relation of equivalence. We show that L.P. problems formulated here contain those of completely ergodic, and terminating M.D.P.‘s as special cases. Finally we extend this discussion to semi-Markovian Decision processes (semi-M.D.P.‘s).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Linear programming considerations on Markovian Decision Processes with no discounting

Abstract

Talk to us

Similar Papers

More From: Journal of Mathematical Analysis and Applications

Lead the way for us

Similar Papers

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion
M Yu Kitayev
Theory of Probability & Its Applications | VOL. 30
M Yu KitayevM Yu Kitayev
01 Jun 1986
Theory of Probability & Its Applications | VOL. 30

Traffic Signal Control based on Markov Decision Process**This work is supported in part by the National Science Foundation of China (Grant No. 61374110, 61433002, 61221003), NSFC International Cooperation Project (Grant No. 71361130012).
Yunwen Xu ... Zhao Zhou
IFAC-PapersOnLine | VOL. 49
Yunwen Xu, et. al.Yunwen Xu ... Zhao Zhou
01 Jan 2015
IFAC-PapersOnLine | VOL. 49

On direct sums of Markovian decision process
Hisashi Mine ... Yoshio Tabata
Journal of Mathematical Analysis and Applications | VOL. 28
Hisashi Mine, et. al.Hisashi Mine ... Yoshio Tabata
01 Nov 1969
Journal of Mathematical Analysis and Applications | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Linear programming considerations on Markovian Decision Processes with no discounting

Abstract

Talk to us

Similar Papers

More From: Journal of Mathematical Analysis and Applications