Abstract
Stochastic multiarmed bandits (stochastic MABs) are a problem of sequential decision-making with noisy rewards, where an agent sequentially chooses actions under unknown reward distributions to minimize cumulative regret. The majority of prior works on stochastic MABs assume that the reward distribution of each action has bounded supports or follows light-tailed distribution, i.e., sub-Gaussian distribution. However, in a variety of decision-making problems, the reward distributions follow a heavy-tailed distribution. In this regard, we consider stochastic MABs with heavy-tailed rewards, whose p th moment is bounded by a constant νp for . First, we provide theoretical analysis on sub-optimality of the existing exploration methods for heavy-tailed rewards where it has been proven that existing exploration methods do not guarantee a minimax optimal regret bound. Second, to achieve the minimax optimality under heavy-tailed rewards, we propose a minimax optimal robust upper confidence bound (MR-UCB) by providing tight confidence bound of a p -robust estimator. Furthermore, we also propose a minimax optimal robust adaptively perturbed exploration (MR-APE) which is a randomized version of MR-UCB. In particular, unlike the existing robust exploration methods, both proposed methods have no dependence on νp . Third, we provide the gap-dependent and independent regret bounds of proposed methods and prove that both methods guarantee the minimax optimal regret bound for a heavy-tailed stochastic MAB problem. The proposed methods are the first algorithm that theoretically guarantees the minimax optimality under heavy-tailed reward settings to the best of our knowledge. Finally, we demonstrate the superiority of the proposed methods in simulation with Pareto and Fréchet noises with respect to regrets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Neural Networks and Learning Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.