Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs

Hao Yuan,Cong Shi,Qi Luo

doi:10.1287/mnsc.2020.3799

Abstract

We consider a periodic-review single-product inventory system with fixed cost under censored demand. Under full demand distributional information, it is well known that the celebrated (s, S) policy is optimal. In this paper, we assume the firm does not know the demand distribution a priori and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand). Our performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. Compared with prior literature, the key difficulty of this problem lies in the loss of joint convexity of the objective function as a result of the presence of fixed cost. We develop the first learning algorithm, termed the [Formula: see text] policy, that combines the power of stochastic gradient descent, bandit controls, and simulation-based methods in a seamless and nontrivial fashion. We prove that the cumulative regret is [Formula: see text], which is provably tight up to a logarithmic factor. We also develop several technical results that are of independent interest. We believe that the developed framework could be widely applied to learning other important stochastic systems with partial convexity in the objectives. This paper was accepted by Chung Piaw Teo, optimization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs

Abstract

Talk to us

Similar Papers

More From: Management Science

Lead the way for us

Journal: Management Science	Publication Date: Feb 8, 2021
Citations: 47

Similar Papers

Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand Learning
Boxiao Chen ... Cong Shi
SSRN Electronic Journal | VOL. -
Boxiao Chen, et. al.Boxiao Chen ... Cong Shi
27 Sep 2019
SSRN Electronic Journal | VOL. -

Closing the Gap: A Learning Algorithm for Lost-Sales Inventory Systems with Lead Times
Huanan Zhang ... Xiuli Chao
Management Science | VOL. 66
Huanan Zhang, et. al.Huanan Zhang ... Xiuli Chao
26 Feb 2017
Management Science | VOL. 66

Closing the Gap: A Learning Algorithm for the Lost-Sales Inventory System with Lead Times
Huanan Zhang ... Cong Shi
SSRN Electronic Journal | VOL. -
Huanan Zhang, et. al.Huanan Zhang ... Cong Shi
01 Jan 2017
SSRN Electronic Journal | VOL. -

Online Learning and Pricing for Service Systems with Reusable Resources
Huiwen Jia ... Siqian Shen
SSRN Electronic Journal | VOL. -
Huiwen Jia, et. al.Huiwen Jia ... Siqian Shen
01 Jan 2020
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs

Abstract

Talk to us

Similar Papers

More From: Management Science