Maximal Objectives in the Multiarmed Bandit with Applications

Eren Ozbay,Vijay Kamble

doi:10.1287/mnsc.2022.00801

Abstract

In several applications of the stochastic multiarmed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, we study a new objective in the classic setup. Given K arms, instead of maximizing the expected total reward from T pulls (the traditional “sum” objective), we consider the vector of total rewards earned from each of the K arms at the end of T pulls and aim to maximize the expected highest total reward across arms (the “max” objective). For this objective, we show that any policy must incur an instance-dependent asymptotic regret of [Formula: see text] (with a higher instance-dependent constant compared with the traditional objective) and a worst case regret of [Formula: see text]. We then design an adaptive explore-then-commit policy featuring exploration based on appropriately tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and simultaneously achieves these bounds (up to logarithmic factors). We then generalize our algorithmic insights to the problem of maximizing the expected value of the average total reward of the top m arms with the highest total rewards. Our numerical experiments demonstrate the efficacy of our policies compared with several natural alternatives in practical parameter regimes. We discuss applications of these new objectives to the problem of conditioning an adequate supply of value-providing market entities (workers/sellers/service providers) in online platforms and marketplaces. This paper was accepted by Vivek Farias, data science. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.00801 .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Maximal Objectives in the Multiarmed Bandit with Applications

Abstract

Talk to us

Similar Papers

More From: Management Science

Lead the way for us

Similar Papers

Multi-armed Bandit Experimental Design: Online Decision-Making and Adaptive Inference
David Simchi-Levi ... Chonghuan Wang
Management Science | VOL. -
David Simchi-Levi, et. al.David Simchi-Levi ... Chonghuan Wang
20 Sep 2024
Management Science | VOL. -

Fair Exploration via Axiomatic Bargaining
Jackie Baek ... Vivek F Farias
Management Science | VOL. -
Jackie Baek, et. al.Jackie Baek ... Vivek F Farias
15 Mar 2024
Management Science | VOL. -

An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.
Shangdong Yang ... Yang Gao
IEEE transactions on neural networks and learning systems | VOL. 32
Shangdong Yang, et. al.Shangdong Yang ... Yang Gao
01 May 2021
IEEE transactions on neural networks and learning systems | VOL. 32

The Square Root Agreement Rule for Incentivizing Truthful Feedback on Online Platforms
Vijay Kamble ... Kannan Ramchandran
Management Science | VOL. 69
Vijay Kamble, et. al.Vijay Kamble ... Kannan Ramchandran
01 Jul 2022
Management Science | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Maximal Objectives in the Multiarmed Bandit with Applications

Abstract

Talk to us

Similar Papers

More From: Management Science