We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown, Smith, and Sun (2010). This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these nonanticipativity constraints. In this paper, we study DPs that have a convex structure and consider gradient penalties that are based on ﬁrst-order linear approximations of approximate value functions. When used with perfect information relaxations, these penalties lead to subproblems that are deterministic convex optimization problems. We show that these gradient penalties can, in theory, provide tight bounds for convex DPs and can be used to improve on bounds provided by other relaxations, such as Lagrangian relaxation bounds. Finally, we apply these results in two example applications: ﬁrst, a network revenue management problem that describes an airline trying to manage seat capacity on its ﬂights; and second, an inventory management problem with lead times and lost sales. These are challenging problems of significant practical interest. In both examples, we compute performance bounds using information relaxations with gradient penalties and ﬁnd that some relatively easy-to-compute heuristic policies are nearly optimal.
To appear, Operations Research, 2014.