Offline Planning and Online Learning Under Recovering Rewards

David Simchi-Levi,Zeyu Zheng,Feng Zhu

doi:10.1287/mnsc.2021.04202

Abstract

Motivated by emerging applications, such as live-streaming e-commerce, promotions, and recommendations, we introduce and solve a general class of nonstationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to [Formula: see text] out of N different arms in each time period and (ii) the expected reward of an arm immediately drops after it is pulled and then nonparametrically recovers as the arm’s idle time increases. With the objective of maximizing the expected cumulative reward over T time periods, we design a class of purely periodic policies that jointly set a period to pull each arm. For the proposed policies, we prove performance guarantees for both the offline and the online problems. For the offline problem when all model parameters are known, the proposed periodic policy obtains a long-run approximation ratio that is at the order of [Formula: see text], which is asymptotically optimal when K grows to infinity. For the online problem when the model parameters are unknown and need to be dynamically learned, we integrate the offline periodic policy with the upper confidence bound procedure to construct on online policy. The proposed online policy is proved to approximately have [Formula: see text] regret against the offline benchmark. Our framework and policy design may shed light on broader offline planning and online learning applications with nonstationary and recovering rewards. This paper was accepted by J. George Shanthikumar, data science. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2021.04202 .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Offline Planning and Online Learning Under Recovering Rewards

Abstract

Talk to us

Similar Papers

More From: Management Science

Lead the way for us

Similar Papers

A Simple and Optimal Policy Design with Safety Against Heavy-Tailed Risk for Stochastic Bandits
David Simchi-Levi ... Feng Zhu
Management Science | VOL. -
David Simchi-Levi, et. al.David Simchi-Levi ... Feng Zhu
30 Oct 2024
Management Science | VOL. -

How Big Should Your Data Really Be? Data-Driven Newsvendor: Learning One Sample at a Time
Omar Besbes ... Omar Mouchtaki
Management Science | VOL. 69
Omar Besbes, et. al.Omar Besbes ... Omar Mouchtaki
05 Apr 2023
Management Science | VOL. 69

Enhancing Customer–Supplier Coordination Through Customer-Managed Inventory
Shi Chen ... Morris A Cohen
Management Science | VOL. -
Shi Chen, et. al.Shi Chen ... Morris A Cohen
21 Mar 2024
Management Science | VOL. -

Good Prophets Know When the End Is Near
Siddhartha Banerjee ... Daniel Freund
Management Science | VOL. -
Siddhartha Banerjee, et. al.Siddhartha Banerjee ... Daniel Freund
23 Sep 2024
Management Science | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Offline Planning and Online Learning Under Recovering Rewards

Abstract

Talk to us

Similar Papers

More From: Management Science