Implicit Regularization of Sub-Gradient Method in Robust Matrix Recovery: Don’t be Afraid of Outliers

It is well-known that simple short-sighted algorithms, such as gradient descent, generalize well in the over-parameterized learning tasks, due to their implicit regularization. However, it is unknown whether the implicit regularization of these algorithms can be extended to robust learning tasks, where a subset of samples may be grossly corrupted with noise. In this work, … Read more

Scalable Inference of Sparsely-changing Markov Random Fields with Strong Statistical Guarantees

In this paper, we study the problem of inferring time-varying Markov random fields (MRF), where the underlying graphical model is both sparse and changes sparsely over time. Most of the existing methods for the inference of time-varying MRFs rely on the regularized maximum likelihood estimation (MLE), that typically suffer from weak statistical guarantees and high … Read more

Strong Optimal Classification Trees

Decision trees are among the most popular machine learning models and are used routinely in applications ranging from revenue management and medicine to bioinformatics. In this paper, we consider the problem of learning optimal binary classification trees with univariate splits. Literature on the topic has burgeoned in recent years, motivated both by the empirical suboptimality … Read more

Kernel Distributionally Robust Optimization

We propose kernel distributionally robust optimization (Kernel DRO) using insights from the robust optimization theory and functional analysis. Our method uses reproducing kernel Hilbert spaces (RKHS) to construct a wide range of convex ambiguity sets, including sets based on integral probability metrics and finite-order moment bounds. This perspective unifies multiple existing robust and stochastic optimization … Read more

An Alternating Method for Cardinality-Constrained Optimization: A Computational Study for the Best Subset Selection and Sparse Portfolio Problems

Cardinality-constrained optimization problems are notoriously hard to solve both in theory and practice. However, as famous examples such as the sparse portfolio optimization and best subset selection problems show, this class is extremely important in real-world applications. In this paper, we apply a penalty alternating direction method to these problems. The key idea is to … Read more

Sparse Poisson regression via mixed-integer optimization

We present a mixed-integer optimization (MIO) approach to sparse Poisson regression. The MIO approach to sparse linear regression was first proposed in the 1970s, but has recently received renewed attention due to advances in optimization algorithms and computer hardware. In contrast to many sparse estimation algorithms, the MIO approach has the advantage of finding the … Read more

Exterior-point Optimization for Nonconvex Learning

In this paper we present the nonconvex exterior-point optimization solver (NExOS)—a novel first-order algorithm tailored to constrained nonconvex learning problems. We consider the problem of minimizing a convex function over nonconvex constraints, where the projection onto the constraint set is single-valued around local minima. A wide range of nonconvex learning problems have this structure including … Read more

Graph Recovery From Incomplete Moment Information

We investigate a class of moment problems, namely recovering a measure supported on the graph of a function from partial knowledge of its moments, as for instance in some problems of optimal transport or density estimation. We show that the sole knowledge of first degree moments of the function, namely linear measurements, is sufficient to … Read more

Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality

Wasserstein distributionally robust optimization (DRO) aims to find robust and generalizable solutions by hedging against data perturbations in Wasserstein distance. Despite its recent empirical success in operations research and machine learning, existing performance guarantees for generic loss functions are either overly conservative due to the curse of dimensionality, or plausible only in large sample asymptotics. … Read more

Dual optimal design and the Christoffel-Darboux polynomial

The purpose of this short note is to show that the Christoffel-Darboux polynomial, useful in approximation theory and data science, arises naturally when deriving the dual to the problem of semi-algebraic D-optimal experimental design in statistics. It uses only elementary notions of convex analysis. ArticleDownload View PDF