A mathematical introduction to SVMs with self-concordant kernel

A derivation of so-called “soft-margin support vector machines with kernel” is presented along with elementary proofs that do not rely on concepts from functional analysis such as Mercer’s theorem or reproducing kernel Hilbert spaces which are frequently cited in this context. The analysis leads to new continuity properties of the kernel functions, in particular a … Read more

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced~(KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear mapping that reduces data into … Read more

A graph-structured distance for mixed-variable domains with meta variables

Heterogeneous datasets emerge in various machine learning and optimization applications that feature different input sources, types or formats. Most models or methods do not natively tackle heterogeneity. Hence, such datasets are often partitioned into smaller and simpler ones, which may limit the generalizability or performance, especially if data is limited. The first main contribution of … Read more

Mixed-Integer Linear Optimization for Cardinality-Constrained Random Forests

Random forests are among the most famous algorithms for solving classification problems, in particular for large-scale data sets. Considering a set of labeled points and several decision trees, the method takes the majority vote to classify a new given point. In some scenarios, however, labels are only accessible for a proper subset of the given … Read more

A Proximal-Gradient Method for Constrained Optimization

We present a new algorithm for solving optimization problems with objective functions that are the sum of a smooth function and a (potentially) nonsmooth regularization function, and nonlinear equality constraints. The algorithm may be viewed as an extension of the well-known proximal-gradient method that is applicable when constraints are not present. To account for nonlinear … Read more

Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation

We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit trade-off between convergence guarantees and convergence speed, which contrasts with the typical worst-case analysis. Our learned optimization algorithms provably outperform related ones … Read more

Stochastic Aspects of Dynamical Low-Rank Approximation in the Context of Machine Learning

The central challenges of today’s neural network architectures are the prohibitive memory footprint and training costs associated with determining optimal weights and biases. A large portion of research in machine learning is therefore dedicated to constructing memory-efficient training methods. One promising approach is dynamical low-rank training (DLRT), which represents and trains parameters as a low-rank … Read more

Data Collaboration Analysis with Orthonormal Basis Selection and Alignment

Data Collaboration (DC) enables multiple parties to jointly train a model by sharing only linear projections of their private datasets. The core challenge in DC is to align the bases of these projections without revealing each party’s secret basis. While existing theory suggests that any target basis spanning the common subspace should suffice, in practice, … Read more

Robust support vector machines via conic optimization

We consider the problem of learning support vector machines robust to uncertainty. It has been established in the literature that typical loss functions, including the hinge loss, are sensible to data perturbations and outliers, thus performing poorly in the setting considered. In contrast, using the 0-1 loss or a suitable non-convex approximation results in robust … Read more

Data-Driven Reliable Facility Location Design

We study the reliable (uncapacitated) facility location (RFL) problem in a data-driven environment where historical observations of random demands and disruptions are available. Owing to the combinatorial optimization nature of the RFL problem and the mixed-binary randomness of parameters therein, the state-of-the-art RFL models applied to the data-driven setting either suggest overly conservative solutions, or … Read more