We consider a repeated newsvendor problem in which the inventory manager has no prior information about the demand and can access only censored/sales data. In analogy to multiarmed bandit problems, the manager needs to simultaneously “explore” and “exploit” with inventory decisions in order to minimize the cumulative cost. Our goal is to understand the hardness of the problem disentangled from any probabilistic assumptions on the demand sequence—importantly, independence or time stationarity—and, correspondingly, to develop policies that perform well with respect to the regret criterion. We design a cost estimator that is tailored to the special structure of the censoring problem, and we show that, if coupled with the classic exponentially weighted forecaster, it achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. This result also leads to two important insights: the benefit from “information stalking” as well as the cost of censoring are both negligible, at least in terms of the regret. We demonstrate the flexibility of our technique by combining it with the fixed share forecaster to provide strong guarantees in terms of tracking regret, a powerful notion of regret that uses a large class of time-varying action sequences as benchmark. Numerical experiments suggest that the resulting policy outperforms existing policies (that are tailored to or facilitated by time stationarity) on nonstationary demand models with time-varying noise, trend, and seasonality components. Finally, we consider the “combinatorial” version of the repeated newsvendor problem, that is, single-warehouse, multiretailer inventory management of a perishable product. We extend the proposed approach so that, again, it achieves near-optimal performance in terms of the regret. Funding: G. Lugosi was supported by the Spanish Ministry of Economy, Industry and Competitiveness [Grant MTM2015-67304-P (AEI/FEDER, UE)]. M. G. Markakis was supported by the Spanish Ministry of Economy and Competitiveness [Grant ECO2016-75905-R (AEI/FEDER, UE)] and a Juan de la Cierva fellowship as well as the Spanish Ministry of Science and Innovation through a Ramón y Cajal fellowship. G. Neu was supported by the UPFellows Fellowship (Marie Curie COFUND program) [Grant 600387]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/ijoo.2022.0017 .
Read full abstract