Exterior-point Optimization for Nonconvex Learning

In this paper we present the nonconvex exterior-point optimization solver (NExOS)—a novel first-order algorithm tailored to constrained nonconvex learning problems. We consider the problem of minimizing a convex function over nonconvex constraints, where the projection onto the constraint set is single-valued around local minima. A wide range of nonconvex learning problems have this structure including … Read more

Graph Recovery From Incomplete Moment Information

We investigate a class of moment problems, namely recovering a measure supported on the graph of a function from partial knowledge of its moments, as for instance in some problems of optimal transport or density estimation. We show that the sole knowledge of first degree moments of the function, namely linear measurements, is sufficient to … Read more

Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality

Wasserstein distributionally robust optimization (DRO) aims to find robust and generalizable solutions by hedging against data perturbations in Wasserstein distance. Despite its recent empirical success in operations research and machine learning, existing performance guarantees for generic loss functions are either overly conservative due to the curse of dimensionality, or plausible only in large sample asymptotics. … Read more

Dual optimal design and the Christoffel-Darboux polynomial

The purpose of this short note is to show that the Christoffel-Darboux polynomial, useful in approximation theory and data science, arises naturally when deriving the dual to the problem of semi-algebraic D-optimal experimental design in statistics. It uses only elementary notions of convex analysis. ArticleDownload View PDF

Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates

In this paper, we study smooth stochastic multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions. We assume access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle. For solving this class of problems, we propose two algorithms using moving-average stochastic estimates, and analyze … Read more

Exact Penalty Function for L21 Norm Minimization over the Stiefel Manifold

L21 norm minimization with orthogonality constraints, feasible region of which is called Stiefel manifold, has wide applications in statistics and data science. The state-of-the-art approaches adopt proximal gradient technique on either Stiefel manifold or its tangent spaces. The consequent subproblem does not have closed-form solution and hence requires an iterative procedure to solve which is … Read more

The block mutual coherence property condition for signal recovery

Compressed sensing shows that a sparse signal can stably be recovered from incomplete linear measurements. But, in practical applications, some signals have additional structure, where the nonzero elements arise in some blocks. We call such signals as block-sparse signals. In this paper, the $\ell_2/\ell_1-\alpha\ell_2$ minimization method for the stable recovery of block-sparse signals is investigated. … Read more

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning. Algorithmic convergence and statistical estimation rates are well-understood for such problems. However, quantifying the uncertainty associated with the underlying training algorithm is not well-studied in the non-convex setting. In order to address this short-coming, in this work, … Read more

Linear Programming and Community Detection

The problem of community detection with two equal-sized communities is closely related to the minimum graph bisection problem over certain random graph models. In the stochastic block model distribution over networks with community structure, a well-known semidefinite programming (SDP) relaxation of the minimum bisection problem recovers the underlying communities whenever possible. Motivated by their superior … Read more

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a … Read more