Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Kei Takemura,Naonori Kakimura,Shinji Ito,Takuro Fukunaga,Ken-Ichi Kawarabayashi,Hanna Sumita,Daisuke Hatano

doi:10.1609/aaai.v35i11.17177

Abstract

The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds T. However, there is a gap of Õ(max(√d, √k)) between the current best upper and lower bounds, where d is the dimension of the feature vectors, k is the number of the chosen arms in a round, and Õ(·) ignores the logarithmic factors. The dependence of k and d is of practical importance because k may be larger than T in real-world applications such as recommender systems. In this paper, we fill the gap by improving the upper and lower bounds. More precisely, we show that the C2UCB algorithm proposed by Qin, Chen, and Zhu (2014) has the optimal regret bound Õ(d√kT + dk) for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C2UCB algorithm and demonstrate that it enjoys the optimal regret bound for a more general problem that can take into account other objectives simultaneously. We also show that our technique would be applicable to related problems. Numerical experiments support our theoretical results and considerations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Applicability of the Analytical Solution to N-Person Social Dilemma Games
Ugo Merlone ... Ferenc Szidarovszky
Frontiers in Applied Mathematics and Statistics | VOL. 4
Ugo Merlone, et. al.Ugo Merlone ... Ferenc Szidarovszky
31 May 2018
Frontiers in Applied Mathematics and Statistics | VOL. 4

Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards
Kuan-Hao Huang ... Hsuan-Tien Lin
-
Kuan-Hao Huang, et. al.Kuan-Hao Huang ... Hsuan-Tien Lin
01 Jan 2015
01 Jan 2015

Coordination in Market Entry Games with Symmetric Players
James A Sundali ... Darryl A Seale
Organizational Behavior and Human Decision Processes | VOL. 64
James A Sundali, et. al.James A Sundali ... Darryl A Seale
01 Nov 1995
Organizational Behavior and Human Decision Processes | VOL. 64

Intensive care unit/step-down unit queuing game with length of stay decisions
Yawo M Kobara ... David Andrews Stanford
Operations Research for Health Care | VOL. 34
Yawo M Kobara, et. al.Yawo M Kobara ... David Andrews Stanford
10 Aug 2022
Operations Research for Health Care | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence