Abstract

Recently, Thompson sampling has been shown to achieve good theoretical performance guarantees for stochastic control problems with parameter uncertainty when the state, control, and parameter spaces are all finite. Much less is known however about the performance of Thompson sampling when applied to continuous or more general spaces, which constitutes an important class of problems in practice. In this paper, we study Thompson sampling when applied to a broad class of average cost stochastic control problems where the state, control, and parameter spaces are all general measurable spaces. The main contributions of our paper are establishing theoretical performance guarantees for Thompson sampling as measured by: first, expected posterior sampling error; and second, average per period regret.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call