Open AccessOpen Access licenseAboutSectionsView PDF ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InEmail Go to SectionOpen AccessOpen Access license HomeStochastic SystemsVol. 3, No. 1 A Linear Response Bandit ProblemAlexander Goldenshluger, Assaf ZeeviAlexander Goldenshluger, Assaf ZeeviPublished Online:26 Aug 2013https://doi.org/10.1287/11-SSY032AbstractWe consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy. It is shown that the regret grows logarithmically in the time horizon n and no policy can achieve a slower growth rate over all feasible problem instances. In this setting of linear response bandits, the identity of the sub-optimal action changes with the values of the covariate vector, and the optimal policy is subject to sampling from the inferior population at a rate that grows like n. Previous Back to Top Next FiguresReferencesRelatedInformationCited ByAdaptive Design of Personalized Dose-Finding Clinical TrialsSaeid Delshad, Amin Khademi21 July 2022 | Service Science, Vol. 0, No. 0Adaptive Sequential Experiments with Unknown Information Arrival ProcessesYonatan Gur, Ahmadreza Momeni10 June 2022 | Manufacturing & Service Operations Management, Vol. 0, No. 0Fast Rates for Contextual Linear OptimizationYichun Hu, Nathan Kallus, Xiaojie Mao29 March 2022 | Management Science, Vol. 68, No. 6Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret RegimesYichun Hu, Nathan Kallus, Xiaojie Mao7 February 2022 | Operations Research, Vol. 0, No. 0Smoothness-Adaptive Contextual BanditsYonatan Gur, Ahmadreza Momeni, Stefan Wager26 January 2022 | Operations Research, Vol. 0, No. 0Multimodal Dynamic PricingYining Wang, Boxiao Chen, David Simchi-Levi27 January 2021 | Management Science, Vol. 67, No. 10Ranking and Selection with Covariates for Personalized Decision MakingHaihui Shen, L. Jeff Hong, Xiaowei Zhang12 February 2021 | INFORMS Journal on Computing, Vol. 33, No. 4Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban WarehousesRong Jin, David Simchi-Levi, Li Wang, Xinshang Wang, Sen Yang12 January 2021 | Management Science, Vol. 67, No. 8Nonparametric Pricing Analytics with Customer CovariatesNingyuan Chen, Guillermo Gallego15 February 2021 | Operations Research, Vol. 69, No. 3Mostly Exploration-Free Algorithms for Contextual BanditsHamsa Bastani, Mohsen Bayati, Khashayar Khosravi27 July 2020 | Management Science, Vol. 67, No. 3Dynamic Assortment Personalization in High DimensionsNathan Kallus, Madeleine Udell13 May 2020 | Operations Research, Vol. 68, No. 4Online Decision Making with High-Dimensional CovariatesHamsa Bastani, Mohsen Bayati7 November 2019 | Operations Research, Vol. 68, No. 1Optimal Prescriptive TreesDimitris Bertsimas, Jack Dunn, Nishanth Mundru16 April 2019 | INFORMS Journal on Optimization, Vol. 1, No. 2 Volume 3, Issue 1June 2013Pages 1-321 Article Information Metrics Downloaded 328 times in the past 12 months Information Received:August 01, 2011Published Online:August 26, 2013 Copyright © 2013, The author(s)Cite asAlexander Goldenshluger, Assaf Zeevi (2013) A Linear Response Bandit Problem. Stochastic Systems 3(1):230-261. https://doi.org/10.1287/11-SSY032 KeywordsSequential allocationestimationbandit problemsregretminimaxrate–optimal policyPDF download
Read full abstract