Abstract

We study a dynamic inventory control problem involving fixed setup costs and random demand distributions. With an infinite planning horizon, model primitives including costs and distributions are set to be stationary. Under a given demand distribution, an $(s,S)$ policy has been known to minimize the long-run per-period average cost. Out of the need to model situations involving new products or unencountered economic conditions, however, we depart from the traditional model by allowing the stationary demand distribution to be largely unknown, to the effect that it could be anywhere in a given ambiguity set. Our goal is to rein in the long-run growth of the regret resulting from applying a policy that strives to learn the underlying demand while simultaneously meting out ordering decisions based on its learning. We propose a policy that controls the pace at which a traditional $(s,S)$-computing algorithm is applied to the empirical distribution of the demand learned over time. The regret incurred from the policy has a bound of $O(T^{1/2}\cdot(\ln T)^{1/2})$.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call