Multi-armed Bandit with Sub-exponential Rewards

Huiwen Jia,Siqian Shen,Cong Shi

doi:10.2139/ssrn.3926846

Abstract

We consider a general class of multi-armed bandits (MAB) problems with sub-exponential rewards. This is primarily motivated by service systems with exponential inter-arrival and service distributions. It is well-known that the celebrated Upper Confidence Bound (UCB) algorithm can achieve tight regret bound for MAB under sub-Gaussian rewards. There has been subsequent work by Bubeck et al. (2013) extending this tightness result to any reward distributions with finite variance by leveraging robust mean estimators. In this paper, we present three alternative UCB based algorithms, termed UCB-Rad, UCB-Warm, and UCB-Hybrid, specifically for MAB with sub-exponential rewards. While not being the first to achieve tight regret bounds, these algorithms are conceptually simpler and provide a more explicit analysis for this problem. Moreover, we present a rental bike revenue management application and conduct numerical experiments. We find that UCB-Warm and UCB-Hybrid outperform UCB-Rad in our computational experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-armed Bandit with Sub-exponential Rewards

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal

Lead the way for us

Journal: SSRN Electronic Journal	Publication Date: Jan 1, 2021
License type: other-oa

Similar Papers

Multi-armed bandit with sub-exponential rewards
Huiwen Jia ... Siqian Shen
Operations Research Letters | VOL. 49
Huiwen Jia, et. al.Huiwen Jia ... Siqian Shen
12 Aug 2021
Operations Research Letters | VOL. 49

Enhancing UCB-tuned and Asymptotically Optimal UCB Algorithms through Weighted Average Techniques in Multi-Armed Bandit Scenarios
Chang Qu
Highlights in Science, Engineering and Technology | VOL. 94
Chang QuChang Qu
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

Some Variations of Upper Confidence Bound for General Game Playing
Iván Francisco-Valencia ... José Raymundo Marcial-Romero
-
Iván Francisco-Valencia, et. al.Iván Francisco-Valencia ... José Raymundo Marcial-Romero
01 Jan 2019
01 Jan 2019

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields
Jiazhen Wu
Highlights in Science, Engineering and Technology | VOL. 94
Jiazhen WuJiazhen Wu
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-armed Bandit with Sub-exponential Rewards

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal