Nonstationary Direct Policy Search for Risk-Averse Stochastic Optimization

This paper presents an approach to non-stationary policy search for finite-horizon, discrete-time Markovian decision problems with large state spaces, constrained action sets, and a risk-sensitive optimality criterion. The methodology relies on modeling time variant policy parameters by a non-parametric response surface model for an indirect parametrized policy motivated by the Bellman equation. Through the interpolating … Read more

Semi-Infinite Relaxations for the Dynamic Knapsack Problem with Stochastic Item Sizes

We consider a version of the knapsack problem in which an item size is random and revealed only when the decision maker attempts to insert it. After every successful insertion the decision maker can choose the next item dynamically based on the remaining capacity and available items, while an unsuccessful insertion terminates the process. We … Read more

Two approaches to constrained stochastic optimal control problems

In this article, we study and compare two approaches to solving stochastic optimal control problems with an expectation constraint on the final state. The case of a probability constraint is included in this framework. The first approach is based on a dynamic programming principle and the second one uses Lagrange relaxation. These approaches can be … Read more

Parameter-free Sampled Fictitious Play for Solving Deterministic Dynamic Programming Problems

To facilitate fast solution of deterministic dynamic programming problems, we present a parameter-free variation of the Sampled Fictitious Play (SFP) algorithm. Its random tie-braking procedure imparts a natural randomness to the algorithm which prevents it from “getting stuck” at a local optimal solution and allows the discovery of an optimal path in a finite number … Read more

Simplex Algorithm for Countable-state Discounted Markov Decision Processes

We consider discounted Markov Decision Processes (MDPs) with countably-infinite state spaces, finite action spaces, and unbounded rewards. Typical examples of such MDPs are inventory management and queueing control problems in which there is no specific limit on the size of inventory or queue. Existing solution methods obtain a sequence of policies that converges to optimality … Read more

Rectangular sets of probability measures

In this paper we consider the notion of rectangularity of a set of probability measures, introduced in Epstein and Schneider (2003), from a somewhat different point of view. We define rectangularity as a property of dynamic decomposition of a distributionally robust stochastic optimization problem and show how it relates to the modern theory of coherent … Read more

Robust constrained shortest path problems under budgeted uncertainty

We study the robust constrained shortest path problem under resource uncertainty. After proving that the problem is \NPhard in the strong sense for arbitrary uncertainty sets, we focus on budgeted uncertainty sets introduced by Bertsimas and Sim (2003) and their extension to variable uncertainty by Poss (2013). We apply classical techniques to show that the … Read more

Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown, Smith, and Sun (2010). This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these constraints. In this paper, we study infinite horizon DPs with discounted costs and consider … Read more

The Multi-Band Robust Knapsack Problem — A Dynamic Programming Approach —

In this paper, we consider the multi-band robust knapsack problem which generalizes the Γ-robust knapsack problem by subdividing the single deviation band into several smaller bands. We state a compact ILP formulation and develop two dynamic programming algorithms based on the presented model where the first has a complexity linear in the number of items … Read more

Robust Data-Driven Dynamic Programming

In stochastic optimal control the distribution of the exogenous noise is typically unknown and must be inferred from limited data before dynamic programming (DP)-based solution schemes can be applied. If the conditional expectations in the DP recursions are estimated via kernel regression, however, the historical sample paths enter the solution procedure directly as they determine … Read more