A Q-Learning Algorithm with Continuous State Space

We study in this paper a Markov Decision Problem (MDP) with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced to solve this problem by Watkins in 1989 for completely discrete MDPs. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the … Read more

Nonserial dynamic programming and local decomposition algorithms in discrete programming

One of perspective ways to exploit sparsity in the dependency graph of an optimization problem as J.N. Hooker stressed is nonserial dynamic programming (NSDP) which allows to compute solution in stages, each of them uses results from previous stages. The class of discrete optimization problems with the block-tree-structure matrix of constraints is considered. Nonserial dynamic … Read more

New solution approaches to the general single machine earliness-tardiness problem

This paper addresses the general single-machine earliness-tardiness problem with distinct release dates, due dates, and unit costs. The aim of this research is to obtain an exact nonpreemptive solution in which machine idle time is allowed. In a hybrid approach, we formulate and then solve the problem using dynamic programming (DP) while incorporating techniques from … Read more

A pricing problem under Monge property

We study a pricing problem where buyers with non-uniform demand purchase one of many items. Each buyer has a known benefit for each item and purchases the item that gives the largest utility, which is defined to be the difference between the benefit and the price of the item. The optimization problem is to decide … Read more

Temporal difference learning with kernels for pricing american-style options

We propose in this paper to study the problem of estimating the cost-to-go function for an infinite-horizon discounted Markov chain with possibly continuous state space. For implementation purposes, the state space is typically discretized. As soon as the dimension of the state space becomes large, the computation is no more practicable, a phenomenon referred to … Read more

Optimal Information Monitoring Under a Politeness Constraint

We describe scheduling algorithms for monitoring an information source whose contents change at times modeled by a nonhomogeneous Poisson process. In a given time period of length T, we enforce a politeness constraint that we may only probe the source at most n times. This constraint, along with an optional constraint that no two probes … Read more

New hybrid optimization algorithms for machine scheduling problems

Dynamic programming, branch-and-bound, and constraint programming are the standard solution principles for finding optimal solutions to machine scheduling problems. We propose a new hybrid optimization framework that integrates all three methodologies. The hybrid framework leads to powerful solution procedures. We demonstrate our approach through the optimal solution of the single-machine total weighted completion time scheduling … Read more

Aggregation in Stochastic Dynamic Programming

We present a general aggregation method applicable to all finite-horizon Markov decision problems. States of the MDP are aggregated into macro-states based on a pre-selected collection of “distinguished” states which serve as entry points into macro-states. The resulting macro-problem is also an MDP, whose solution approximates an optimal solution to the original problem. The aggregation … Read more

A Class of Hybrid Methods for Revenue Management

We consider a Markov decision process model of a network revenue management problem. Working within this formal framework, we study policies that combine aspects of mathematical programming approaches and pure Markov decision process methods. The policies employ heuristics early in the booking horizon, and switch to a more-detailed decision rule closer to the time of … Read more

Solving Stability Problems on a Superclass of Interval Graphs

We introduce a graph invariant, called thinness, and show that a maximum weighted stable set on a graph $G(V, E)$ with thinness $k$ may be found in $O(\frac{|V|}{k})^k$-time, if a certain representation is given. We show that a graph has thinness 1 if and only if it is an interval graph, while a graph with … Read more