Estimation and control in discounted stochastic dynamic programming

Schäl Manfred

doi:10.1080/17442508708833435

Abstract

The principle of estimation and control was introduced and studied independently by Kurano and Mandl under the average return criterion for models in which some of the data depend on an unknown parameter. Kurano and Mandl considered Markov decision models with finite state space and bounded rewards. Conditions are established for the existence of an optimal policy based on a consistent estimator for the unknown parameter which is optimal uniformly in the parameter. These results were extended by Kolonko to semi-Markov models with denumerable state space and unbounded rewards. The present paper considers the same principle of estimation and control for the discounted return criterion. The underlying semi-Markov decision model may have a denumerable state space and unbounded rewards. Conditions are established for the existence of a policy which is asymptotically discount optimal uniformly in the unknown parameter. The essential conditions are continuity and compactness conditions and a multiplicative form of the Foster criterion for positive recurrence of Markov chains formulated here for Markov decision models. An application to the control of an M|G|l-queue is discussed

Full Text