Bounds for the regret loss in dynamic programming under adaptive control

M Kolonko

doi:10.1007/bf01916897

Bounds for the regret loss in dynamic programming under adaptive control

M Kolonko

https://doi.org/10.1007/bf01916897

Copy DOI

Journal: Zeitschrift für Operations Research	Publication Date: Dec 1, 1983
Citations: 8

Affiliation: Karlsruhe Institute of Technology

#Expected Estimation Error #Adaptive Procedure + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We consider a Markovian dynamic programming model in which the transition probabilities depend on an unknown parameterθ. We estimate the unknownθ and adapt the control action to the estimated value. Bounds are given for the expected regret loss under this adaptive procedure, i.e. for the loss caused by using the adaptive procedure instead of an (unknown) optimal one. We assume that the dependence of the model onθ is Lipschitz continuous. The bounds depend on the expected estimation error. When confidence intervals forθ with fixed width are available, we obtain bounds for the expected regret loss that hold uniformly inθ.

Full Text