Abstract
The K-armed bandit problem is a formalization of the explorationversus exploitation dilemma, a well-known issue in stochasticoptimization tasks. In a K-armed bandit problem, a player isconfronted with a gambling machine with K arms where each arm isassociated to an unknown gain distribution and the goal is tomaximize the sum of the rewards (or minimize the sum of losses).Several approaches have been proposed in literature to deal withthe K-armed bandit problem. Most of them combine a greedyexploitation strategy with a random exploratory phase. This paperfocuses on the improvement of the exploration step by havingrecourse to the notion of probability of correct selection (PCS), awell-known notion in the simulation literature yet overlooked inthe optimization domain. The rationale of our approach is toperform at each exploration step the arm sampling which maximizesthe probability of selecting the optimal arm (i.e. the PCS) at thefollowing step. This strategy is implemented by a bandit algorithm,called e-PCSgreedy, which integrates the PCS explorationapproach with the classical e-greedy schema. A set ofnumerical experiments on artificial and real datasets shows that amore effective exploration may improve the performance of theentire bandit strategy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.