The traditional two-stage stochastic program approach is to minimize the total expected cost with the consideration of parameter uncertainty, and the distribution of the random parameters is assumed to be known. However, in most practices, the actual distribution of the random parameters is not known, and only a certain amount of historical data are available. The solution obtained from the traditional two-stage stochastic program can be biased and suboptimal for the true problem, if the estimated distribution of the random parameter is not accurate, for which it is usually true when only a limited amount of historical data are available. In this paper, we study the data-driven risk-averse stochastic optimization problem. Instead of assuming the distribution of random parameter is known, a series of historical data, drawn from the true distribution, are observed. Based on the obtained historical data, we construct the confidence set of the ambiguous distribution of the random parameters, and develop a risk-averse stochastic optimization framework to minimize the total expected cost under the worst-case distribution within the constructed confidence set. We introduce the Wasserstein metric to construct the confidence set and by using this metric, we can successfully reformulate the risk-averse two-stage stochastic program to its tractable counterpart. In addition, we derive the worst-case distribution and develop efficient algorithms to solve the reformulated problem. Moreover, we perform convergence analysis to show that the risk-averseness of our proposed formulation vanishes as the amount of historical data grows to infinity, and accordingly, the optimal objective value converges to that of the traditional risk-neutral two-stage stochastic program. Finally, numerical experiments on facility location and stochastic unit commitment problems verify the effectiveness of our proposed solution approach.

## Article

View Data-Driven Risk-Averse Stochastic Optimization with Wasserstein Metric