Bandits with concave rewards and convex knapsacks

Shipra Agrawal,Nikhil R Devanur

doi:10.1145/2600057.2602844

Abstract

In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon. This model subsumes the classic multi-armed bandit (MAB) model, and the Bandits with Knapsacks (BwK) model of Badanidiyuru et al.[2013]. We also consider an extension of this model to allow linear contexts, similar to the linear contextual extension of the MAB model. We demonstrate that a natural and simple extension of the UCB family of algorithms for MAB provides a polynomial time algorithm that has near-optimal regret guarantees for this substantially more general model, and matches the bounds provided by Badanidiyuru et al.[2013] for the special case of BwK, which is quite surprising. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well studied problems/algorithms such as the Blackwell approachability problem, online convex optimization, and the Frank-Wolfe technique for convex optimization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bandits with concave rewards and convex knapsacks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A survey of the application and technical improvement of the multi-armed bandit
Ruoyi Tong
Applied and Computational Engineering | VOL. 77
Ruoyi TongRuoyi Tong
16 Jul 2024
Applied and Computational Engineering | VOL. 77

Models and Efficient Algorithms for Convex Optimization under Uncertainty

-

20 Aug 2019
20 Aug 2019

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields
Jiazhen Wu
Highlights in Science, Engineering and Technology | VOL. 94
Jiazhen WuJiazhen Wu
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

Almost Optimal Channel Access in Multi-Hop Networks with Unknown Channel Variables
Yaqin Zhou ... Xiang-Yang Li
-
Yaqin Zhou, et. al.Yaqin Zhou ... Xiang-Yang Li
01 Jun 2014
01 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bandits with concave rewards and convex knapsacks

Abstract

Talk to us

Similar Papers