Data-Driven Stochastic Dual Dynamic Programming: Performance Guarantees and Regularization Schemes

We propose a data-driven scheme for multistage stochastic linear programming with Markovian random parameters by extending the stochastic dual dynamic programming (SDDP) algorithm. In our data-driven setting, only a finite number of historical trajectories are available. The proposed SDDP scheme evaluates the cost-to-go functions only at the observed sample points, where the conditional expectations are estimated empirically using kernel regression. The scheme thus avoids the construction of scenario trees, which may incur exponential time complexity during the backward induction step. However, if the training data is sparse, the resulting SDDP algorithm exhibits a high optimistic bias that gives rise to poor out-of-sample performances. To mitigate the small sample effects, we adopt ideas from the distributionally robust optimization (DRO), which replaces the empirical conditional expectation in the cost-to-go function with a worst-case conditional expectation over a polyhedral ambiguity set. We derive the theoretical out-of-sample performance guarantee of the data-driven SDDP scheme and demonstrate its effectiveness through extensive numerical experiments.

Article

Download