We study decision problems under uncertainty, where the decision-maker has access to K data sources that carry biased information about the underlying risk factors. The biases are measured by the mismatch between the risk factor distribution and the K data-generating distributions with respect to an optimal transport (OT) distance. In this situation the decision-maker can exploit the information contained in the biased samples by solving a distributionally robust optimization (DRO) problem, where the ambiguity set is defined as the intersection of K OT neighborhoods, each of which is centered at the empirical distribution on the samples generated by a biased data source. We show that if the decision-maker has a prior belief about the biases, then the out-of-sample performance of the DRO solution can improve with K-irrespective of the magnitude of the biases. We also show that, under standard convexity assumptions, the proposed DRO problem is computationally tractable if either K or the dimension of the risk factors is kept constant.
Article
View Wasserstein Distributionally Robust Optimization with Heterogeneous Data Sources