Dual Dynamic Programming with cut selection: convergence proof and numerical experiments

We consider convex optimization problems formulated using dynamic programming equations. Such problems can be solved using the Dual Dynamic Programming algorithm combined with the Level 1 cut selection strategy or the Territory algorithm to select the most relevant Benders cuts. We propose a limited memory variant of Level 1 and show the convergence of DDP … Read more

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $\cS$ and a finite action space $\cA$. We show that any randomized algorithm needs a running time at least $\Omega(\carS^2\carA)$ to compute an $\epsilon$-optimal policy with high probability. We consider two variants of the MDP where the … Read more

Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

We propose a randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality, the algorithm adaptively samples state transitions and makes exponentiated primal-dual updates. We show that it finds an ε-optimal policy using nearly-linear running time in the worst case. For Markov decision processes that … Read more

From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming

We consider linear programming (LP) problems in infinite dimensional spaces that are in general computationally intractable. Under suitable assumptions, we develop an approximation bridge from the infinite-dimensional LP to tractable finite convex programs in which the performance of the approximation is quantified explicitly. To this end, we adopt the recent developments in two areas of … Read more

Dynamic programming algorithms, efficient solution of the LP-relaxation and approximation schemes for the Penalized Knapsack Problem

We consider the 0-1 Penalized Knapsack Problem (PKP). Each item has a profit, a weight and a penalty and the goal is to maximize the sum of the profits minus the greatest penalty value of the items included in a solution. We propose an exact approach relying on a procedure which narrows the relevant range … Read more

Regularized Stochastic Dual Dynamic Programming for convex nonlinear optimization problems

We define a regularized variant of the Dual Dynamic Programming algorithm called REDDP (REgularized Dual Dynamic Programming) to solve nonlinear dynamic programming equations. We extend the algorithm to solve nonlinear stochastic dynamic programming equations. The corresponding algorithm, called SDDP-REG, can be seen as an extension of a regularization of the Stochastic Dual Dynamic Programming (SDDP) … Read more

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods … Read more

Fully Polynomial Time (Sigma,Pi)-Approximation Schemes for Continuous Nonlinear Newsvendor and Continuous Stochastic Dynamic Programs

We study the continuous newsvendor problem (i.e. a newsvendor problem concerning goods of a non-discrete nature, such as fresh fruit juice) and a class of stochastic dynamic programs with several application areas, such as inventory control of a continuous good, economics, and supply chain management. The class is characterized by continuous state and action spaces, … Read more

Relaxation Analysis for the Dynamic Knapsack Problem with Stochastic Item Sizes

We consider a version of the knapsack problem in which an item size is random and revealed only when the decision maker attempts to insert it. After every successful insertion the decision maker can dynamically choose the next item based on the remaining capacity and available items, while an unsuccessful insertion terminates the process. We … Read more

A Copositive Approach for Two-Stage Adjustable Robust Optimization with Uncertain Right-Hand Sides

We study two-stage adjustable robust linear programming in which the right-hand sides are uncertain and belong to a convex, compact uncertainty set. This problem is NP-hard, and the affine policy is a popular, tractable approximation. We prove that under standard and simple conditions, the two-stage problem can be reformulated as a copositive optimization problem, which … Read more