Semi-Parametric Sampling for Stochastic Bandits with Many Arms

Mingdong Ou,Rong Jin,Cheng Yang,Shenghuo Zhu,Nan Li

doi:10.1609/aaai.v33i01.33017933

Abstract

We consider the stochastic bandit problem with a large candidate arm set. In this setting, classic multi-armed bandit algorithms, which assume independence among arms and adopt non-parametric reward model, are inefficient, due to the large number of arms. By exploiting arm correlations based on a parametric reward model with arm features, contextual bandit algorithms are more efficient, but they can also suffer from large regret in practical applications, due to the reward estimation bias from mis-specified model assumption or incomplete features. In this paper, we propose a novel Bayesian framework, called Semi-Parametric Sampling (SPS), for this problem, which employs semi-parametric function as the reward model. Specifically, the parametric part of SPS, which models expected reward as a parametric function of arm feature, can efficiently eliminate poor arms from candidate set. The non-parametric part of SPS, which adopts nonparametric reward model, revises the parametric estimation to avoid estimation bias, especially on the remained candidate arms. We give an implementation of SPS, Linear SPS (LSPS), which utilizes linear function as the parametric part. In semi-parametric environment, theoretical analysis shows that LSPS achieves better regret bound (i.e. O̴(√N1−α dα √T) with α ∈ [0, 1])) than existing approaches. Also, experiments demonstrate the superiority of the proposed approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semi-Parametric Sampling for Stochastic Bandits with Many Arms

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jul 17, 2019
Citations: 3

Similar Papers

Best Arm Identification for Both Stochastic and Adversarial Multi-armed Bandits
Hantao Zhang ... Cong Shen
-
Hantao Zhang, et. al.Hantao Zhang ... Cong Shen
01 Nov 2018
01 Nov 2018

Regulation of exploration for simple regret minimization in Monte-Carlo tree search
Yun-Ching Liu ... Yoshimasa Tsuruoka
-
Yun-Ching Liu, et. al.Yun-Ching Liu ... Yoshimasa Tsuruoka
01 Aug 2015
01 Aug 2015

Information Directed Sampling and Bandits with Heteroscedastic Noise
...
-
, et. al. ...
03 Jul 2018
03 Jul 2018

Achieving complete learning in Multi-Armed Bandit problems
Sattar Vakili ... Qing Zhao
-
Sattar Vakili, et. al.Sattar Vakili ... Qing Zhao
01 Nov 2013
01 Nov 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-Parametric Sampling for Stochastic Bandits with Many Arms

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence