Abstract

A stochastic sequential control system has been studied as a Markovian Decision Process (M.D.P.) originally discussed by Bellman [l]. In M.D.P.‘s two approaches have been studied. One is a Policy Iteration Algorithm (P.I.A.) originally formulated by Howard [2]. This approach has been extended by Blackwell [3], Veinott [4], and others. Another is a Linear Programming (L.P.) Algorithm originally formulated by Manne [5]. Further, Wolfe and Dantzig [6], Derman [7], D’Epenoux [8], De Ghellinck and Eppen [9], and others have also discussed L.P. approach. It is also known that these two approaches are mutually dual in mathematical programming, i.e., these are equivalent. This fact of duality is only known when M.D.P.‘s are discounting, completely ergodic with no discounting in the sense of [2], or terminating in the sense of [lo]. But no result has been established for a general M.D.P. in which there are some ergodic sets plus a transient set, and these sets may change according to any policy we choose. In this paper we formulate a general M.D.P. with no discounting by an L.P. problem. And we give a procedure to solve this L.P. problem. We further show that P.I.A. is equivalent to this L.P. problem, i.e., P.I.A. is a special structure algorithm of the revised L.P. in which pivot operations for many variables are performed simultaneously. An example is presented to understand this relation of equivalence. We show that L.P. problems formulated here contain those of completely ergodic, and terminating M.D.P.‘s as special cases. Finally we extend this discussion to semi-Markovian Decision processes (semi-M.D.P.‘s).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.