Abstract

We consider a Markovian dynamic programming model in which the transition probabilities depend on an unknown parameterθ. We estimate the unknownθ and adapt the control action to the estimated value. Bounds are given for the expected regret loss under this adaptive procedure, i.e. for the loss caused by using the adaptive procedure instead of an (unknown) optimal one. We assume that the dependence of the model onθ is Lipschitz continuous. The bounds depend on the expected estimation error. When confidence intervals forθ with fixed width are available, we obtain bounds for the expected regret loss that hold uniformly inθ.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call