Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

In this work, we instantiate a regularized form of the gradient clipping algorithm and prove that it can converge to the global minima of deep neural network loss functions provided that the net is of sufficient width. We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art … Read more

Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM

\(\) We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying matrix. This problem consists of an important generalization of the Matrix Completion problem, a central problem in Statistics, Operations Research and Machine Learning, that … Read more

A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

\(\) A fully stochastic second-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal $\mathcal{O}(\epsilon^{-3/2})$ complexity bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps. Applications to cases where derivative … Read more

Complexity of Adagrad and other first-order methods for nonconvex optimization problems with bounds and convex constraints

A parametric class of trust-region algorithms for constrained nonconvex optimization is analyzed, where the objective function is never computed. By defining appropriate first-order stationarity criteria, we are able to extend the Adagrad method to the newly considered problem and retrieve the standard complexity rate of the projected gradient method that uses both the gradient and … Read more

An Extended Validity Domain for Constraint Learning

We consider embedding a predictive machine-learning model within a prescriptive optimization problem. In this setting, called constraint learning, we study the concept of a validity domain, i.e., a constraint added to the feasible set, which keeps the optimization close to the training data, thus helping to ensure that the computed optimal solution exhibits less prediction … Read more

A mathematical introduction to SVMs with self-concordant kernel

A derivation of so-called “soft-margin Support Vector Machines with kernel” is presented which does not rely on concepts from functional analysis such as Mercer’s theorem that is frequently cited in this context, and that leads to a new analysis of the continuity properties of the kernel functions such as a new self-concordance condition for the … Read more

Optimal counterfactual explanations for k-Nearest Neighbors using Mathematical Optimization and Constraint Programming

\(\) Within the topic of explainable AI, counterfactual explanations to classifiers have received significant recent attention. We study counterfactual explanations that try to explain why a data point received an undesirable classification by providing the closest data point that would have received a desirable one. Within the context of one the simplest and most popular … Read more

An Integer Programming Approach To Subspace Clustering With Missing Data

In the Subspace Clustering with Missing Data (SCMD) problem, we are given a collection of n partially observed d-dimensional vectors. The data points are assumed to be concentrated near a union of low-dimensional subspaces. The goal of SCMD is to cluster the vectors according to their subspace membership and recover the underlying basis, which can … Read more

Conjecturing-Based Discovery of Patterns in Data

We propose the use of a conjecturing machine that suggests feature relationships in the form of bounds involving nonlinear terms for numerical features and boolean expressions for categorical features. The proposed Conjecturing framework recovers known nonlinear and boolean relationships among features from data. In both settings, true underlying relationships are revealed. We then compare the … Read more

Learning the Follower’s Objective Function in Sequential Bilevel Games

We consider bilevel optimization problems in which the leader has no or only partial knowledge about the objective function of the follower. The studied setting is a sequential one in which the bilevel game is played repeatedly. This allows the leader to learn the objective function of the follower over time. We focus on two … Read more