Distributionally Robust Batch Contextual Bandits

Nian Si,Jose Blanchet,Fan Zhang,Zhengyuan Zhou

doi:10.1287/mnsc.2023.4678

Abstract

Policy learning using historical observational data are an important problem that has widespread applications. Examples include selecting offers, prices, or advertisements for consumers; choosing bids in contextual first-price auctions; and selecting medication based on patients’ characteristics. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data: an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting data set.This paper was accepted by Hamid Nazerzadeh, data science.Funding: This work was supported by the National Science Foundation [Grant CCF-2106508] and the Air Force Office of Scientific Research [Award FA9550-20-1-0397]. Z. Zhou also gratefully acknowledges the JP Morgan AI Research Grant and the New York University’s Center for Global Economy and Business faculty research grant for support on this work. Additional support is gratefully acknowledged from the National Science Foundation [Grants 1915967 and 2118199].Supplemental Material: The data files and online appendix are available at https://doi.org/10.1287/mnsc.2023.4678 .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributionally Robust Batch Contextual Bandits

Abstract

Talk to us

Similar Papers

More From: Management Science

Lead the way for us

Journal: Management Science	Publication Date: Mar 31, 2023
Citations: 3

Similar Papers

Online Learning and Decision Making Under Generalized Linear Model with High-Dimensional Data
Xue Wang ... Tao Yao
Management Science | VOL. -
Xue Wang, et. al.Xue Wang ... Tao Yao
13 Nov 2024
Management Science | VOL. -

Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies
Boxiao Chen ... Zhengyuan Zhou
Management Science | VOL. -
Boxiao Chen, et. al.Boxiao Chen ... Zhengyuan Zhou
04 Mar 2024
Management Science | VOL. -

Immunochemical Studies on Lectins
J T Miller ... W C Boyd
Vox Sanguinis | VOL. 13
J T Miller, et. al.J T Miller ... W C Boyd
01 Dec 1967
Vox Sanguinis | VOL. 13

The Impact of Management on Clinical Performance: Evidence from Physician Practice Management Companies
Ambar La Forgia
Management Science | VOL. 69
Ambar La ForgiaAmbar La Forgia
01 Aug 2023
Management Science | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributionally Robust Batch Contextual Bandits

Abstract

Talk to us

Similar Papers

More From: Management Science