A linear response bandit problem

Alexander Goldenshluger,Assaf Zeevi

doi:10.1214/11-ssy032

Abstract

Open AccessOpen Access licenseAboutSectionsView PDF ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InEmail Go to SectionOpen AccessOpen Access license HomeStochastic SystemsVol. 3, No. 1 A Linear Response Bandit ProblemAlexander Goldenshluger, Assaf ZeeviAlexander Goldenshluger, Assaf ZeeviPublished Online:26 Aug 2013https://doi.org/10.1287/11-SSY032AbstractWe consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy. It is shown that the regret grows logarithmically in the time horizon n and no policy can achieve a slower growth rate over all feasible problem instances. In this setting of linear response bandits, the identity of the sub-optimal action changes with the values of the covariate vector, and the optimal policy is subject to sampling from the inferior population at a rate that grows like n. Previous Back to Top Next FiguresReferencesRelatedInformationCited ByAdaptive Design of Personalized Dose-Finding Clinical TrialsSaeid Delshad, Amin Khademi21 July 2022 | Service Science, Vol. 0, No. 0Adaptive Sequential Experiments with Unknown Information Arrival ProcessesYonatan Gur, Ahmadreza Momeni10 June 2022 | Manufacturing & Service Operations Management, Vol. 0, No. 0Fast Rates for Contextual Linear OptimizationYichun Hu, Nathan Kallus, Xiaojie Mao29 March 2022 | Management Science, Vol. 68, No. 6Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret RegimesYichun Hu, Nathan Kallus, Xiaojie Mao7 February 2022 | Operations Research, Vol. 0, No. 0Smoothness-Adaptive Contextual BanditsYonatan Gur, Ahmadreza Momeni, Stefan Wager26 January 2022 | Operations Research, Vol. 0, No. 0Multimodal Dynamic PricingYining Wang, Boxiao Chen, David Simchi-Levi27 January 2021 | Management Science, Vol. 67, No. 10Ranking and Selection with Covariates for Personalized Decision MakingHaihui Shen, L. Jeff Hong, Xiaowei Zhang12 February 2021 | INFORMS Journal on Computing, Vol. 33, No. 4Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban WarehousesRong Jin, David Simchi-Levi, Li Wang, Xinshang Wang, Sen Yang12 January 2021 | Management Science, Vol. 67, No. 8Nonparametric Pricing Analytics with Customer CovariatesNingyuan Chen, Guillermo Gallego15 February 2021 | Operations Research, Vol. 69, No. 3Mostly Exploration-Free Algorithms for Contextual BanditsHamsa Bastani, Mohsen Bayati, Khashayar Khosravi27 July 2020 | Management Science, Vol. 67, No. 3Dynamic Assortment Personalization in High DimensionsNathan Kallus, Madeleine Udell13 May 2020 | Operations Research, Vol. 68, No. 4Online Decision Making with High-Dimensional CovariatesHamsa Bastani, Mohsen Bayati7 November 2019 | Operations Research, Vol. 68, No. 1Optimal Prescriptive TreesDimitris Bertsimas, Jack Dunn, Nishanth Mundru16 April 2019 | INFORMS Journal on Optimization, Vol. 1, No. 2 Volume 3, Issue 1June 2013Pages 1-321 Article Information Metrics Downloaded 328 times in the past 12 months Information Received:August 01, 2011Published Online:August 26, 2013 Copyright © 2013, The author(s)Cite asAlexander Goldenshluger, Assaf Zeevi (2013) A Linear Response Bandit Problem. Stochastic Systems 3(1):230-261. https://doi.org/10.1287/11-SSY032 KeywordsSequential allocationestimationbandit problemsregretminimaxrate–optimal policyPDF download

Highlights

Sequential allocation problems, otherwise known as multi-armed bandit problems, arise frequently in various areas of statistics, adaptive control, marketing, economics and machine learning
The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy
Assumption (A1) states that Xt is a random vector with non–degenerate distribution over Rd, which ensures that the arm parameters β1 and β2 are identifiable

Summary

Stochastic Systems

Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org. To cite this article: Alexander Goldenshluger, Assaf Zeevi (2013) A Linear Response Bandit Problem. Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-andConditions. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.) and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to transform strategic visions and achieve better outcomes. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

University of Haifa and Columbia University

Introduction

On the event Dt max λmin

Write n

We have