Tuning Bandit Algorithms in Stochastic Environments

Jean-Yves Audibert,Rémi Munos,Csaba Szepesvári

doi:10.1007/978-3-540-75225-7_15

Abstract

Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except for some very special bandit problems, the regret, for upper confidence bounds based algorithms with standard bias sequences, concentrates only at a polynomial rate. Hence, although these algorithms achieve logarithmic expected regret rates, they seem less attractive when the risk of suffering much worse than logarithmic regret is also taken into account.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tuning Bandit Algorithms in Stochastic Environments

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Exploration–exploitation tradeoff using variance estimates in multi-armed bandits
Jean-Yves Audibert ... Csaba Szepesvári
Theoretical Computer Science | VOL. 410
Jean-Yves Audibert, et. al.Jean-Yves Audibert ... Csaba Szepesvári
31 Jan 2009
Theoretical Computer Science | VOL. 410

A Dominant Strategy Truthful, Deterministic Multi-Armed Bandit Mechanism with Logarithmic Regret
...
-
, et. al. ...
08 May 2017
08 May 2017

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model
Cheng Chen ... Shuai Li
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Cheng Chen, et. al.Cheng Chen ... Shuai Li
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

Channel Exploration and Exploitation with Imperfect Spectrum Sensing in Cognitive Radio Networks
Zhou Zhang ... Hai Jiang
IEEE Journal on Selected Areas in Communications | VOL. 31
Zhou Zhang, et. al.Zhou Zhang ... Hai Jiang
01 Mar 2013
IEEE Journal on Selected Areas in Communications | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tuning Bandit Algorithms in Stochastic Environments

Abstract

Talk to us

Similar Papers