Some new accelerated and stochastic gradient descent algorithms based on locally Lipschitz gradient constants

In this paper, we revisit the recent stepsize applied for the gradient descent scheme which is called NGD proposed by [Hoai et al., A novel stepsize for gradient descent method, Operations Research Letters (2024) 53, doi: 10.1016/j.orl.2024.107072]. We first investigate NGD stepsize with two well-known accelerated techniques which are Heavy ball and Nesterov’s methods. In … Read more

Deep learning and hyperparameter optimization for assessing one’s eligibility for a subcutaneous implantable cardioverter-defibrillator

In cardiology, it is standard for patients suffering from ventricular arrhythmias (the leading cause of sudden cardiac death) belonging to high risk populations to be treated using Subcutaneous Implantable Cardioverter-Defibrillators (S-ICDs). S-ICDs carry a risk of so-called T Wave Over Sensing (TWOS), which can lead to inappropriate shocks with an inherent health risk. For this … Read more

Neur2SP: Neural Two-stage Stochastic Programming

Stochastic programming is a powerful modeling framework for decision-making under uncertainty. In this work, we tackle two-stage stochastic programs (2SPs), the most widely applied and studied class of stochastic programming models. Solving 2SPs exactly requires evaluation of an expected value function that is computationally intractable. Additionally, having a mixed-integer linear program (MIP) or a nonlinear … Read more

A minibatch stochastic Quasi-Newton method adapted for nonconvex deep learning problems

In this study, we develop a limited memory nonconvex Quasi-Newton (QN) method, tailored to deep learning (DL) applications. Since the stochastic nature of (sampled) function information in minibatch processing can affect the performance of QN methods, three strategies are utilized to overcome this issue. These involve a novel progressive trust-region radius update (suitable for stochastic … Read more

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

This paper proposes an algorithm, RMDA, for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we … Read more

The structure of conservative gradient fields

The classical Clarke subdifferential alone is inadequate for understanding automatic differentiation in nonsmooth contexts. Instead, we can sometimes rely on enlarged generalized gradients called “conservative fields”, defined through the natural path-wise chain rule: one application is the convergence analysis of gradient-based deep learning algorithms. In the semi-algebraic case, we show that all conservative fields are … Read more

Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates

In this paper, we study smooth stochastic multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions. We assume access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle. For solving this class of problems, we propose two algorithms using moving-average stochastic estimates, and analyze … Read more

On the Impact of Deep Learning-based Time-series Forecasts on Multistage Stochastic Programming Policies

Multistage stochastic programming provides a modeling framework for sequential decision-making problems that involve uncertainty. One typically overlooked aspect of this methodology is how uncertainty is incorporated into modeling. Traditionally, statistical forecasting techniques with simple forms, e.g., (first-order) autoregressive time-series models, are used to extract scenarios to be added to optimization models to represent the uncertain … Read more

An Integer Programming Approach to Deep Neural Networks with Binary Activation Functions

We study deep neural networks with binary activation functions (BDNN), i.e. the activation function only has two states. We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical integer programming solvers. Additionally, a heuristic solution algorithm is presented and we study the model … Read more

Stochastic generalized gradient methods for training nonconvex nonsmooth neural networks

The paper observes a similarity between the stochastic optimal control of discrete dynamical systems and the learning multilayer neural networks. It focuses on contemporary deep networks with nonconvex nonsmooth loss and activation functions. The machine learning problems are treated as nonconvex nonsmooth stochastic optimization problems. As a model of nonsmooth nonconvex dependences, the so-called generalized … Read more