Neur2SP: Neural Two-stage Stochastic Programming

Stochastic programming is a powerful modeling framework for decision-making under uncertainty. In this work, we tackle two-stage stochastic programs (2SPs), the most widely applied and studied class of stochastic programming models. Solving 2SPs exactly requires evaluation of an expected value function that is computationally intractable. Additionally, having a mixed-integer linear program (MIP) or a nonlinear … Read more

A minibatch stochastic Quasi-Newton method adapted for nonconvex deep learning problems

In this study, we develop a limited memory nonconvex Quasi-Newton (QN) method, tailored to deep learning (DL) applications. Since the stochastic nature of (sampled) function information in minibatch processing can affect the performance of QN methods, three strategies are utilized to overcome this issue. These involve a novel progressive trust-region radius update (suitable for stochastic … Read more

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

This paper proposes an algorithm, RMDA, for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we … Read more

The structure of conservative gradient fields

The classical Clarke subdifferential alone is inadequate for understanding automatic differentiation in nonsmooth contexts. Instead, we can sometimes rely on enlarged generalized gradients called “conservative fields”, defined through the natural path-wise chain rule: one application is the convergence analysis of gradient-based deep learning algorithms. In the semi-algebraic case, we show that all conservative fields are … Read more

Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates

In this paper, we study smooth stochastic multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions. We assume access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle. For solving this class of problems, we propose two algorithms using moving-average stochastic estimates, and analyze … Read more

On the Impact of Deep Learning-based Time-series Forecasts on Multistage Stochastic Programming Policies

Multistage stochastic programming provides a modeling framework for sequential decision-making problems that involve uncertainty. One typically overlooked aspect of this methodology is how uncertainty is incorporated into modeling. Traditionally, statistical forecasting techniques with simple forms, e.g., (first-order) autoregressive time-series models, are used to extract scenarios to be added to optimization models to represent the uncertain … Read more

An Integer Programming Approach to Deep Neural Networks with Binary Activation Functions

We study deep neural networks with binary activation functions (BDNN), i.e. the activation function only has two states. We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical integer programming solvers. Additionally, a heuristic solution algorithm is presented and we study the model … Read more

Stochastic generalized gradient methods for training nonconvex nonsmooth neural networks

The paper observes a similarity between the stochastic optimal control of discrete dynamical systems and the learning multilayer neural networks. It focuses on contemporary deep networks with nonconvex nonsmooth loss and activation functions. The machine learning problems are treated as nonconvex nonsmooth stochastic optimization problems. As a model of nonsmooth nonconvex dependences, the so-called generalized … Read more

Substantiation of the Backpropagation Technique via the Hamilton-Pontryagin Formalism for Training Nonconvex Nonsmooth Neural Networks

The paper observes the similarity between the stochastic optimal control of discrete dynamical systems and the training multilayer neural networks. It focuses on contemporary deep networks with nonconvex nonsmooth loss and activation functions. In the paper, the machine learning problems are treated as nonconvex nonsmooth stochastic optimization problems. As a model of nonsmooth nonconvex dependences, … Read more

Generalized Gradients in Problems of Dynamic Optimization, Optimal Control, and Machine Learning

In this work, nonconvex nonsmooth problems of dynamic optimization, optimal control in discrete time (including feedback control), and machine learning are considered from a common point of view. An analogy is observed between tasks of controlling discrete dynamic systems and training multilayer neural networks with nonsmooth target function and connections. Methods for calculating generalized gradients … Read more