On the Optimization Landscape of Burer-Monteiro Factorization: When do Global Solutions Correspond to Ground Truth?

In low-rank matrix recovery, the goal is to recover a low-rank matrix, given a limited number of linear and possibly noisy measurements. Low-rank matrix recovery is typically solved via a nonconvex method called Burer-Monteiro factorization (BM). If the rank of the ground truth is known, BM is free of sub-optimal local solutions, and its true solutions … Read more

Variable Selection for Kernel Two-Sample Tests

We consider the variable selection problem for two-sample tests, aiming to select the most informative features to best distinguish samples from two groups. We propose a kernel maximum mean discrepancy (MMD) framework to solve this problem and further derive its equivalent mixed-integer programming formulations for linear, quadratic, and Gaussian types of kernel functions. Our proposed … Read more

Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

Article Download View Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training

A class of multi-level algorithms for unconstrained nonlinear optimization is presented which does not require the evaluation of the objective function. The class contains the momentum-less AdaGrad method as a particular (single-level) instance. The choice of avoiding the evaluation of the objective function is intended to make the algorithms of the class less sensitive to … Read more

A classification method based on a cloud of spheres

\(\) In this article we propose a binary classification model to distinguish a specific class that corresponds to a characteristic that we intend to identify (fraud, spam, disease). The classification model is based on a cloud of spheres that circumscribe the points of the class to be identified. It is intended to build a model … Read more

Analyzing Inexact Hypergradients for Bilevel Learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that … Read more

A Levenberg-Marquardt Method for Nonsmooth Regularized Least Squares

\(\) We develop a Levenberg-Marquardt method for minimizing the sum of a smooth nonlinear least-squares term \(f(x) = \frac{1}{2} \|F(x)\|_2^2\) and a nonsmooth term \(h\). Both \(f\) and \(h\) may be nonconvex. Steps are computed by minimizing the sum of a regularized linear least-squares model and a model of \(h\) using a first-order method such … Read more

Data-driven Prediction of Relevant Scenarios for Robust Combinatorial Optimization

We study iterative methods for (two-stage) robust combinatorial optimization problems with discrete uncertainty. We propose a machine-learning-based heuristic to determine starting scenarios that provide strong lower bounds. To this end, we design dimension-independent features and train a Random Forest Classifier on small-dimensional instances. Experiments show that our method improves the solution process for larger instances … Read more

A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression

Logistic regression is one of the widely-used classification tools to construct prediction models. For datasets with a large number of features, feature subset selection methods are considered to obtain accurate and interpretable prediction models, in which irrelevant and redundant features are removed. In this paper, we address the problem of feature subset selection in logistic … Read more

Expected Value of Matrix Quadratic Forms with Wishart distributed Random Matrices

To explore the limits of a stochastic gradient method, it may be useful to consider an example consisting of an infinite number of quadratic functions. In this context, it is appropriate to determine the expected value and the covariance matrix of the stochastic noise, i.e. the difference of the true gradient and the approximated gradient … Read more