Optimal Robust Policy for Feature-Based Newsvendor

We study policy optimization for the feature-based newsvendor, which seeks an end-to-end policy that renders an explicit mapping from features to ordering decisions. Unlike existing works that restrict the policies to some parametric class which may suffer from sub-optimality (such as affine class) or lack of interpretability (such as neural networks), we aim to optimize over all measurable functions of features. In this case, the classical empirical risk minimization yields a policy that are not well-defined on unseen features. To avoid such degeneracy, we consider a distributionally robust framework. This leads to an adjustable robust optimization, whose optimal solutions are notoriously difficult to obtain except for a few notable cases. Perhaps surprisingly, we identify a new class of policies that are proven to be exactly optimal and can be computed efficiently. The optimal robust policy is obtained by extending an optimal robust in-sample policy to unobserved features in a particular way and can be interpreted as a Lipschitz regularized critical fractile of the empirical conditional demand distribution. We compare our method with several benchmarks using real data and demonstrate its superior empirical performance.

Article

Download