First-Order Bayesian Regret Analysis of Thompson Sampling

Sebastien Bubeck,Mark Sellke

doi:10.1109/tit.2022.3213630

Abstract

We address online combinatorial optimization when the player has a prior over the adversary’s sequence of losses. In this setting, Russo and Van Roy proposed an information theoretic analysis of Thompson Sampling based on the information ratio, allowing for elegant proofs of Bayesian regret bounds. In this paper we introduce three novel ideas to this line of work. First we propose a new quantity, the scale-sensitive information ratio, which allows us to obtain more refined first-order regret bounds (i.e., bounds of the form <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(\sqrt {L^{*}})$ </tex-math></inline-formula> where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$L^{*}$ </tex-math></inline-formula> is the loss of the best combinatorial action). Second we replace the entropy over combinatorial actions by a coordinate entropy, which allows us to obtain the first optimal worst-case bound for Thompson Sampling in the combinatorial setting. We additionally introduce a novel link between Bayesian agents and frequentist confidence intervals. Combining these ideas we show that the classical multi-armed bandit first-order regret bound <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$ \widetilde {O}(\sqrt {d L^{*}})$ </tex-math></inline-formula> still holds true in the more challenging and more general semi-bandit scenario. This latter result improves the previous state of the art bound <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$ \widetilde {O}(\sqrt {(d+m^{3})L^{*}})$ </tex-math></inline-formula> by Lykouris, Sridharan and Tardos. Moreover we sharpen these results with two technical ingredients. The first leverages a recent insight of Zimmert and Lattimore to replace Shannon entropy with more refined potential functions in the analysis. The second is a Thresholded Thompson Sampling algorithm, which slightly modifies the original algorithm by never playing low-probability actions. This thresholding results in fully <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$T$ </tex-math></inline-formula> -independent regret bounds when <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$L^{*}\leq \overline {L} ^{*}$ </tex-math></inline-formula> is almost surely upper-bounded, which we show does not hold for ordinary Thompson Sampling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

First-Order Bayesian Regret Analysis of Thompson Sampling

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Information Theory

Lead the way for us

Journal: IEEE Transactions on Information Theory	Publication Date: Mar 1, 2023
Citations: 3

Similar Papers

AN ADAPTIVE PERSONALIZED DAYLIGHTING CONTROL APPROACH FOR OPTIMAL VISUAL SATISFACTION AND LIGHTING ENERGY USE IN OFFICES

-

04 Dec 2019
04 Dec 2019

Uncertainty Quantification and Bayesian Inference of Cloud Parameterization in the NCAR Single Column Community Atmosphere Model (SCAM6)
Raju Pathak ... Sandeep Sahany
Frontiers in Climate | VOL. 3
Raju Pathak, et. al.Raju Pathak ... Sandeep Sahany
16 Jun 2021
Frontiers in Climate | VOL. 3

Bayesian MCMC flood frequency analysis with historical information
Dirceu S Reis ... Jery R Stedinger
Journal of Hydrology | VOL. 313
Dirceu S Reis, et. al.Dirceu S Reis ... Jery R Stedinger
24 May 2005
Journal of Hydrology | VOL. 313

Online Network Revenue Management Using Thompson Sampling
Kris Johnson Ferreira ... He Wang
Operations Research | VOL. 66
Kris Johnson Ferreira, et. al.Kris Johnson Ferreira ... He Wang
01 Nov 2018
Operations Research | VOL. 66

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

First-Order Bayesian Regret Analysis of Thompson Sampling

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Information Theory