The Universal Classification (UC) problem seeks an optimal classifier from a universal policy space to minimize the expected 0-1 loss, also known as the misclassification risk. However, the conventional empirical risk minimization often leads to overfitting and poor out-of-sample performance. To address this limitation, we introduce the Distributionally Robust Universal Classification (DRUC) formulation, which incorporates distributional robustness via the Wasserstein distance centered at the empirical distribution. To manage the infinite-dimensional nature of the DRUC policy space, we develop its in-sample DRUC counterpart, which allows for a more tractable reformulation while preserving robustness properties. We prove that, asymptotically, the in-sample DRUC formulation converges to the original UC formulation and is equivalent to the DRUC formulation. Under mild conditions, we provide non-asymptotic finite-sample performance guarantees. Furthermore, we derive a mixed-integer linear programming (MILP) reformulation to obtain the optimal in-sample DRUC policy and propose an efficient 2-approximation algorithm. Our numerical experiments show the efficiency of the proposed approximation algorithm and demonstrate the superior out-of-sample performance of the in-sample DRUC formulation.
Citation
Chen, S., Xie, W. (2025). Distributionally Robust Universal Classification: Bypassing the Curse of Dimensionality. Available at Optimization Online.