Some new accelerated and stochastic gradient descent algorithms based on locally Lipschitz gradient constants

In this paper, we revisit the recent stepsize applied for the gradient descent scheme which is called NGD proposed by [Hoai et al., A novel stepsize for gradient descent method, Operations Research Letters (2024) 53, doi: 10.1016/j.orl.2024.107072]. We first investigate NGD stepsize with two well-known accelerated techniques which are Heavy ball and Nesterov’s methods. In … Read more

Representing Integer Program Value Function with Neural Networks

We study the value function of an integer program (IP) by characterizing how its optimal value changes as the right-hand side varies. We show that the IP value function can be approximated to any desired degree of accuracy using machine learning (ML) techniques. Since an IP value function is a Chvátal-Gomory (CG) function, we first … Read more

Alternate Training of Shared and Task-Specific Parameters for Multi-Task Neural Networks

This paper introduces novel alternate training procedures for hard-parameter sharing Multi-Task Neural Networks (MTNNs). Traditional MTNN training faces challenges in managing conflicting loss gradients, often yielding sub-optimal performance. The proposed alternate training method updates shared and task-specific weights alternately, exploiting the multi-head architecture of the model. This approach reduces computational costs, enhances training regularization, and … Read more

Using Neural Networks to Solve Linear Bilevel Problems with Unknown Lower Level

Bilevel problems are used to model the interaction between two decision makers in which the lower-level problem, the so-called follower’s problem, appears as a constraint in the upper-level problem of the so-called leader. One issue in many practical situations is that the follower’s problem is not explicitly known by the leader. For such bilevel problems … Read more

The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

Neural networks tend to achieve better accuracy with training if they are larger — even if the resulting models are overparameterized. Nevertheless, carefully removing such excess parameters before, during, or after training may also produce models with similar or even improved accuracy. In many cases, that can be curiously achieved by heuristics as simple as … Read more

Modeling Design and Control Problems Involving Neural Network Surrogates

We consider nonlinear optimization problems that involve surrogate models represented by neural net-works. We demonstrate first how to directly embed neural network evaluation into optimization models, highlight a difficulty with this approach that can prevent convergence, and then characterize stationarity of such models. We then present two alternative formulations of these problems in the specific … Read more

Scoring positive semidefinite cutting planes for quadratic optimization via trained neural networks

Semidefinite programming relaxations complement polyhedral relaxations for quadratic optimization, but global optimization solvers built on polyhedral relaxations cannot fully exploit this advantage. This paper develops linear outer-approximations of semidefinite constraints that can be effectively integrated into global solvers. The difference from previous work is that our proposed cuts are (i) sparser with respect to the … Read more

Deep Neural Network Structures Solving Variational Inequalities

We propose a novel theoretical framework to investigate deep neural networks using the formalism of proximal fixed point methods for solving variational inequalities. We first show that almost all activation functions used in neural networks are actually proximity operators. This leads to an algorithmic model alternating firmly nonexpansive and linear operators. We derive new results … Read more

Correlation analysis between the vibroacoustic behavior of steering gear and ball nut assemblies in the automotive industry

The increase in quality standards in the automotive industry requires specifications to be propagated across the supply chain, a challenge exacerbated in domains where the quality is subjective. In the daily operations of ThyssenKrupp Presta AG, requirements imposed on the vibroacoustic quality of steering gear need to be passed down to their subcomponents. We quantify … Read more

Gradient Descent using Duality Structures

Gradient descent is commonly used to solve optimization problems arising in machine learning, such as training neural networks. Although it seems to be effective for many different neural network training problems, it is unclear if the effectiveness of gradient descent can be explained using existing performance guarantees for the algorithm. We argue that existing analyses … Read more