On the Hardness of Learning from Censored Demand

Gabor Lugosi,Gergely Neu,Mihalis Markakis

doi:10.2139/ssrn.3509255

Abstract

Problem definition: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored data. The manager needs to simultaneously explore and exploit with her inventory decisions, in order to minimize the cumulative cost that the firm incurs. We study the hardness of the problem disentangled from any probabilistic assumptions on the demand, and we develop inventory control policies with guaranteed performance. Academic/practical relevance: The problem is motivated by multi-period inventory management of perishable goods, such as newspapers, fresh food, or certain pharmaceutical products, where demand needs to be learned only through sales. Demand for many goods is non-stationary, e.g., exhibiting trends and/or seasonalities, yet existing literature offers policies that are tailored to, or facilitated by time stationarity. Methodology: We adopt the regret criterion for performance evaluation purposes. By combining concepts and results from partial monitoring, we couple a carefully designed cost estimator to the well-known ExponentiallyWeighted Forecaster. Results: We develop a simple and easy-to-interpret policy that achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. We demonstrate the flexibility of our approach by extending these performance guarantees to: (i) tracking regret, a powerful notion of regret that uses a large class of non-stationary action sequences as benchmark; (ii) single-warehouse multi-retailer inventory management of a perishable product. Managerial implications: Our results lead to two important insights: the benefit from “information stalking” as well as the cost of censoring are insignificant in this setting; paving the way for the design of applicable heuristic policies. Further supported by numerical experiments, our findings illustrate the performance loss that can be incurred when policies that are designed under stationarity assumptions are applied to non-stationary environments.

Full Text