Batch Learning in Stochastic Dual Dynamic Programming

We consider the stochastic dual dynamic programming (SDDP) algorithm, which is a widely employed algorithm applied to multistage stochastic programming, and propose a variant using batch learning, a technique used with success in the reinforcement learning framework. We cast SDDP as a type of Q-learning algorithm and describe its application in both risk neutral and risk averse settings. We demonstrate the efficiency of the algorithm on a lost sales inventory control problem with lead times, as well as a real-world instance of the long-term planning problem of interconnected hydropower plants in Colombia. We find that the proposed technique is able to produce tighter optimality gaps in a shorter amount of time than conventional SDDP, including the PSR SDDP commercial software. We also find that parallel computation of SDDP backward passes benefit from batch learning.

Article

Download

View PDF