Nonstationary value-iteration and adaptive control of discounted semi-Markov processes

Onésimo Hernández-Lerma

doi:10.1016/0022-247x(85)90253-7

Onésimo Hernández-Lerma

Open Access

https://doi.org/10.1016/0022-247x(85)90253-7

Copy DOI

Journal: Journal of Mathematical Analysis and Applications	Publication Date: Dec 1, 1985
Citations: 6	License type: elsevier-specific: oa user license

Abstract

We consider in this paper discounted-reward, denumerable state space, semi-Markov decision processes which depend on unknown parameters. The problems we are interested in are: Given that the true parameter value is unknown, (I) give an iterative scheme to determine the total maximal discounted reward, and (II) find an asymptotically discount optimal (adaptive) policy. Our solutions are inspired by the nonstationary value iteration (NVI) scheme of Federgruen and Schweitzer ( J. Optim. Theory Appl. 34 (1981) , 207–241) combined with the ideas of Schäl (Preprint No. 428, Inst. Angew. Math. Univ. Bonn, 1981) concerning the “principle of estimation and control” for the adaptive control of semi-Markov processes.

Full Text