Data Science Theory – Optimization Online

Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls

Published: 2024/07/18, Updated: 2024/10/18

Data Science Theory, Robust Optimization

Adversarially robust optimization (ARO) has become the de facto standard for training models to defend against adversarial attacks during testing. However, despite their robustness, these models often suffer from severe overfitting. To mitigate this issue, several successful approaches have been proposed, including replacing the empirical distribution in training with: (i) a worst-case distribution within an … Read more

A graph-structured distance for mixed-variable domains with meta variables

Published: 2024/05/20, Updated: 2024/08/22

Data Science Theory, Optimization in Data Science distances, heterogeneous datasets, machine learning, meta variables

Heterogeneous datasets emerge in various machine learning and optimization applications that feature different input sources, types or formats. Most models or methods do not natively tackle heterogeneity. Hence, such datasets are often partitioned into smaller and simpler ones, which may limit the generalizability or performance, especially if data is limited. The first main contribution of … Read more

Stochastic Aspects of Dynamical Low-Rank Approximation in the Context of Machine Learning

Published: 2024/03/23, Updated: 2024/05/15

Data Science Theory, Nonlinear Optimization, Optimization in Data Science deep neural networks, Dynamical Low-Rank Approximation (DLRA), Dynamical Low-Rank Training13 (DLRT), machine learning, stochastic gradient descent

The central challenges of today’s neural network architectures are the prohibitive memory footprint and the training costs associated with determining optimal weights and biases. A large portion of research in machine learning is therefore dedicated to constructing memory-efficient training methods. One promising approach is dynamical low-rank training (DLRT) which represents and trains parameters as a … Read more

Expected Value of Matrix Quadratic Forms with Wishart distributed Random Matrices

Published: 2022/12/02, Updated: 2022/12/13

Melinda Hagedorn

Convex Optimization, Data Science Theory, Stochastic Approaches averaging, expected value, quadratic form, second momentum, stochastic gradient method, Wishart distribution

To explore the limits of a stochastic gradient method, it may be useful to consider an example consisting of an infinite number of quadratic functions. In this context, it is appropriate to determine the expected value and the covariance matrix of the stochastic noise, i.e. the difference of the true gradient and the approximated gradient … Read more

A polyhedral study of multivariate decision trees

Published: 2022/11/14

Carla Michini

Zachary Zhou

(Mixed) Integer Linear Programming, Data Science Theory, Polyhedra facet-defining inequality, mixed-integer programming, optimal decision tree

Decision trees are a widely used tool for interpretable machine learning. Multivariate decision trees employ hyperplanes at the branch nodes to route datapoints throughout the tree and yield more compact models than univariate trees. Recently, mixed-integer programming (MIP) has been applied to formulate the optimal decision tree problem. To strengthen MIP formulations, it is crucial … Read more

Wasserstein Regularization for 0-1 Loss

Published: 2022/10/17

Rui Gao

Data Science Theory, Robust Optimization, Stochastic Programming

Wasserstein distributionally robust optimization (DRO) finds robust solutions by hedging against data perturbation specified by distributions in a Wasserstein ball. The robustness is linked to the regularization effect, which has been studied for continuous losses in various settings. However, existing results cannot be simply applied to the 0-1 loss, which is frequently seen in uncertainty … Read more

Sparse PCA With Multiple Components

Published: 2022/09/29, Updated: 2023/10/31

Ryan Cory-Wright

Jean Pauphilet

Data Science Theory, Data-Mining, Semi-definite Programming

Sparse Principal Component Analysis (sPCA) is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. This involves solving a sparsity and orthogonality-constrained convex maximization problem, which is extremely computationally challenging. Most existing works address sparse PCA via methods—such as iteratively computing … Read more

Optimized convergence of stochastic gradient descent by weighted averaging

Published: 2022/09/23, Updated: 2022/10/05

Melinda Hagedorn

Florian Jarre

Convex Optimization, Data Science Theory, Stochastic Approaches convex optimization, noise, optimal step lengths, optimal weights, stochastic gradient descent, weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the … Read more