Abstract

We are concerned with Markov decision processes with Borel state and action spaces; the transition law and the reward function depend on anunknown parameter. In this framework, we study therecursive adaptive nonstationary value iteration policy, which is proved to be optimal under thesame conditions usually imposed to obtain the optimality of other well-knownnonrecursive adaptive policies. The results are illustrated by showing the existence of optimal adaptive policies for a class of additive-noise systems with unknown noise distribution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call