We study policy optimization for the feature-based newsvendor, which seeks an end-to-end policy that renders an explicit mapping from features to ordering decisions. Most existing works restrict the policies to some parametric class that may suffer from suboptimality (such as affine class) or lack of interpretability (such as neural networks). Differently, we aim to optimize over all functions of features. In this case, the classic empirical risk minimization yields a policy that is not well-defined on unseen feature values. To avoid such degeneracy, we consider a Wasserstein distributionally robust framework. This leads to an adjustable robust optimization, whose optimal solutions are notoriously difficult to obtain except for a few notable cases. Perhaps surprisingly, we identify a new class of policies that are proven to be exactly optimal and can be computed efficiently. The optimal robust policy is obtained by extending an optimal robust in-sample policy to unobserved feature values in a particular way and can be interpreted as a Lipschitz regularized critical fractile of the empirical conditional demand distribution. We compare our method with several benchmarks using synthetic and real data and demonstrate its superior empirical performance. This paper was accepted by J. George Shanthikumar, data science. Supplemental Material: The online appendix and data are available at https://doi.org/10.1287/mnsc.2023.4810 .
Read full abstract