We consider the stochastic dual dynamic programming (SDDP) algorithm, which is a widely employed algorithm applied to multistage stochastic programming, and propose a variant using batch learning, a technique used with success in the reinforcement learning framework. We cast SDDP as a type of Q-learning algorithm and describe its application in both risk neutral and risk averse settings. We demonstrate the efficiency of the algorithm on a lost sales inventory control problem with lead times, as well as a real-world instance of the long-term planning problem of interconnected hydropower plants in Colombia. We find that the proposed technique is able to produce tighter optimality gaps in a shorter amount of time than conventional SDDP, including the PSR SDDP commercial software. We also find that parallel computation of SDDP backward passes benefit from batch learning.

## Article

Download

View Batch Learning in Stochastic Dual Dynamic Programming