Multi-Armed Bandit with Budget Constraint and Variable Costs

Wenkui Ding,Xu-Dong Zhang,Tao Qin,Tie-Yan Liu

doi:10.1609/aaai.v27i1.8637

Abstract

We study the multi-armed bandit problems with budget constraint and variable costs (MAB-BV). In this setting, pulling an arm will receive a random reward together with a random cost, and the objective of an algorithm is to pull a sequence of arms in order to maximize the expected total reward with the costs of pulling those arms complying with a budget constraint. This new setting models many Internet applications (e.g., ad exchange, sponsored search, and cloud computing) in a more accurate manner than previous settings where the pulling of arms is either costless or with a fixed cost. We propose two UCB based algorithms for the new setting. The first algorithm needs prior knowledge about the lower bound of the expected costs when computing the exploration term. The second algorithm eliminates this need by estimating the minimal expected costs from empirical observations, and therefore can be applied to more real-world applications where prior knowledge is not available. We prove that both algorithms have nice learning abilities, with regret bounds of O(ln B). Furthermore, we show that when applying our proposed algorithms to a previous setting with fixed costs (which can be regarded as our special case), one can improve the previously obtained regret bound. Our simulation results on real-time bidding in ad exchange verify the effectiveness of the algorithms and are consistent with our theoretical analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Armed Bandit with Budget Constraint and Variable Costs

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 30, 2013
Citations: 79

Similar Papers

Optimal Bidding Strategy without Exploration in Real-time Bidding
Aritra Ghosh ... Somdeb Sarkhel
-
Aritra Ghosh, et. al.Aritra Ghosh ... Somdeb Sarkhel
01 Jan 2020
01 Jan 2020

Feedback Control of Real-Time Display Advertising
Weinan Zhang ... Yifei Rong
-
Weinan Zhang, et. al.Weinan Zhang ... Yifei Rong
08 Feb 2016
08 Feb 2016

Multi-armed bandits for bid shading in first-price real-time bidding auctions
Tuomo Tilli ... Leonardo Espinosa-Leal
Journal of Intelligent & Fuzzy Systems | VOL. 41
Tuomo Tilli, et. al.Tuomo Tilli ... Leonardo Espinosa-Leal
15 Sep 2021
Journal of Intelligent & Fuzzy Systems | VOL. 41

LEO satellite assisted UAV distribution using combinatorial bandit with fairness and budget constraints.
Ehab Mahmoud Mohamed
PloS one | VOL. 18
Ehab Mahmoud MohamedEhab Mahmoud Mohamed
23 Aug 2023
PloS one | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Armed Bandit with Budget Constraint and Variable Costs

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence