Improving the Exploration Strategy in Bandit Algorithms

Olivier Caelen,Gianluca Bontempi

doi:10.1007/978-3-540-92695-5_5

Abstract

The K-armed bandit problem is a formalization of the explorationversus exploitation dilemma, a well-known issue in stochasticoptimization tasks. In a K-armed bandit problem, a player isconfronted with a gambling machine with K arms where each arm isassociated to an unknown gain distribution and the goal is tomaximize the sum of the rewards (or minimize the sum of losses).Several approaches have been proposed in literature to deal withthe K-armed bandit problem. Most of them combine a greedyexploitation strategy with a random exploratory phase. This paperfocuses on the improvement of the exploration step by havingrecourse to the notion of probability of correct selection (PCS), awell-known notion in the simulation literature yet overlooked inthe optimization domain. The rationale of our approach is toperform at each exploration step the arm sampling which maximizesthe probability of selecting the optimal arm (i.e. the PCS) at thefollowing step. This strategy is implemented by a bandit algorithm,called e-PCSgreedy, which integrates the PCS explorationapproach with the classical e-greedy schema. A set ofnumerical experiments on artificial and real datasets shows that amore effective exploration may improve the performance of theentire bandit strategy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving the Exploration Strategy in Bandit Algorithms

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Olivier Caelen ... Gianluca Bontempi
Annals of Mathematics and Artificial Intelligence | VOL. 60
Olivier Caelen, et. al.Olivier Caelen ... Gianluca Bontempi
10 Jun 2010
Annals of Mathematics and Artificial Intelligence | VOL. 60

Some Aspects of Probability of Correct Selections
Lin Fei ... Jie Mi
Communications in Statistics - Theory and Methods | VOL. 40
Lin Fei, et. al.Lin Fei ... Jie Mi
15 Dec 2011
Communications in Statistics - Theory and Methods | VOL. 40

A Fully Sequential Elimination Procedure for Indifference-Zone Ranking and Selection with Tight Bounds on Probability of Correct Selection
Peter I Frazier
Operations Research | VOL. 62
Peter I FrazierPeter I Frazier
01 Aug 2014
Operations Research | VOL. 62

Conditional probability of correct selection under the continuum partition with applications
W Liu
Communications in Statistics - Theory and Methods | VOL. 22
W LiuW Liu
01 Jan 1992
Communications in Statistics - Theory and Methods | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving the Exploration Strategy in Bandit Algorithms

Abstract

Talk to us

Similar Papers