Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback

Yuko Kuroki,Junya Honda,Atsushi Miyauchi,Masashi Sugiyama,Liyuan Xu

doi:10.1162/neco_a_01299

Abstract

We study the problem of stochastic multiple-arm identification, where an agent sequentially explores a size- subset of arms (also known as a super arm) from given arms and tries to identify the best super arm. Most work so far has considered the semi-bandit setting, where the agent can observe the reward of each pulled arm or assumed each arm can be queried at each round. However, in real-world applications, it is costly or sometimes impossible to observe a reward of individual arms. In this study, we tackle the full-bandit setting, where only a noisy observation of the total sum of a super arm is given at each pull. Although our problem can be regarded as an instance of the best arm identification in linear bandits, a naive approach based on linear bandits is computationally infeasible since the number of super arms is exponential. To cope with this problem, we first design a polynomial-time approximation algorithm for a 0-1 quadratic programming problem arising in confidence ellipsoid maximization. Based on our approximation algorithm, we propose a bandit algorithm whose computation time is (log ), thereby achieving an exponential speedup over linear bandit algorithms. We provide a sample complexity upper bound that is still worst-case optimal. Finally, we conduct experiments on large-scale data sets with more than 10 super arms, demonstrating the superiority of our algorithms in terms of both the computation time and the sample complexity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback

Abstract

Talk to us

Similar Papers

More From: Neural Computation

Lead the way for us

Journal: Neural Computation	Publication Date: Jul 20, 2020
Citations: 9

Similar Papers

Misspecified Linear Bandits
Avishek Ghosh ... Aditya Gopalan
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 31
Avishek Ghosh, et. al.Avishek Ghosh ... Aditya Gopalan
12 Feb 2017
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 31

FedConPE: Efficient Federated Conversational Bandits with Heterogeneous Clients
Zhuohua Li ... Maoli Liu
-
Zhuohua Li, et. al.Zhuohua Li ... Maoli Liu
01 Aug 2024
01 Aug 2024

Using Linear Stochastic Bandits to extend traditional offline Designed Experiments to online settings
Nandan Sudarsanam ... Balaraman Ravindran
Computers & Industrial Engineering | VOL. 115
Nandan Sudarsanam, et. al.Nandan Sudarsanam ... Balaraman Ravindran
02 Dec 2017
Computers & Industrial Engineering | VOL. 115

Sequential Learning of Product Recommendations With Customer Disengagement
Hamsa Bastani ... Divya Singhvi
SSRN Electronic Journal | VOL. -
Hamsa Bastani, et. al.Hamsa Bastani ... Divya Singhvi
13 Sep 2018
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback

Abstract

Talk to us

Similar Papers

More From: Neural Computation