From Optimization to Control: Quasi Policy Iteration

Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we make this analogy explicit across four problem classes with a unified solution characterization. This novel framework, in turn, allows for a systematic transformation of algorithms from one domain to the other. In … Read more

Dynamic courier capacity acquisition in rapid delivery systems: a deep Q-learning approach

With the recent boom of the gig economy, urban delivery systems have experienced substantial demand growth. In such systems, orders are delivered to customers from local distribution points respecting a delivery time promise. An important example is a restaurant meal delivery system, where delivery times are expected to be minutes after an order is placed. … Read more

Batch Learning in Stochastic Dual Dynamic Programming

We consider the stochastic dual dynamic programming (SDDP) algorithm, which is a widely employed algorithm applied to multistage stochastic programming, and propose a variant using batch learning, a technique used with success in the reinforcement learning framework. We cast SDDP as a type of Q-learning algorithm and describe its application in both risk neutral and … Read more

An Adaptive and Near Parameter-free BRKGA Using Q-Learning Method

The Biased Random-Key Genetic Algorithm (BRKGA) is an efficient metaheuristic to solve combinatorial optimization problems but requires parameter tuning so the intensification and diversification of the algorithm work in a balanced way. There is, however, not only one optimal parameter configuration, and the best configuration may differ according to the stages of the evolutionary process. … Read more

SDP-based bounds for the Quadratic Cycle Cover Problem via cutting plane augmented Lagrangian methods and reinforcement learning

We study the Quadratic Cycle Cover Problem (QCCP), which aims to find a node-disjoint cycle cover in a directed graph with minimum interaction cost between successive arcs. We derive several semidefinite programming (SDP) relaxations and use facial reduction to make these strictly feasible. We investigate a nontrivial relationship between the transformation matrix used in the … Read more

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods … Read more