ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as \emph{Recursive One-Over-T SGD} (ROOTSGD), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and … Read more

Exact Penalty Function for L21 Norm Minimization over the Stiefel Manifold

L21 norm minimization with orthogonality constraints, feasible region of which is called Stiefel manifold, has wide applications in statistics and data science. The state-of-the-art approaches adopt proximal gradient technique on either Stiefel manifold or its tangent spaces. The consequent subproblem does not have closed-form solution and hence requires an iterative procedure to solve which is … Read more

The block mutual coherence property condition for signal recovery

Compressed sensing shows that a sparse signal can stably be recovered from incomplete linear measurements. But, in practical applications, some signals have additional structure, where the nonzero elements arise in some blocks. We call such signals as block-sparse signals. In this paper, the $\ell_2/\ell_1-\alpha\ell_2$ minimization method for the stable recovery of block-sparse signals is investigated. … Read more

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning. Algorithmic convergence and statistical estimation rates are well-understood for such problems. However, quantifying the uncertainty associated with the underlying training algorithm is not well-studied in the non-convex setting. In order to address this short-coming, in this work, … Read more

Linear Programming and Community Detection

The problem of community detection with two equal-sized communities is closely related to the minimum graph bisection problem over certain random graph models. In the stochastic block model distribution over networks with community structure, a well-known semidefinite programming (SDP) relaxation of the minimum bisection problem recovers the underlying communities whenever possible. Motivated by their superior … Read more

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a … Read more

The high-order block RIP for non-convex block-sparse compressed sensing

This paper concentrates on the recovery of block-sparse signals, which is not only sparse but also nonzero elements are arrayed into some blocks (clusters) rather than being arbitrary distributed all over the vector, from linear measurements. We establish high-order sufficient conditions based on block RIP to ensure the exact recovery of every block $s$-sparse signal … Read more

Safe screening rules for L0-Regression

We give safe screening rules to eliminate variables from regression with L0 regularization or cardinality constraint. These rules are based on guarantees that a feature may or may not be selected in an optimal solution. The screening rules can be computed from a convex relaxation solution in linear time, without solving the L0 optimization problem. … Read more

Learning Optimal Classification Trees: Strong Max-Flow Formulations

We consider the problem of learning optimal binary classification trees. Literature on the topic has burgeoned in recent years, motivated both by the empirical suboptimality of heuristic approaches and the tremendous improvements in mixed-integer programming (MIP) technology. Yet, existing approaches from the literature do not leverage the power of MIP to its full extent. Indeed, … Read more

Outlier detection in time series via mixed-integer conic quadratic optimization

We consider the problem of estimating the true values of a Wiener process given noisy observations corrupted by outliers. The problem considered is closely related to the Trimmed Least Squares estimation problem, a robust estimation procedure well-studied from a statistical standpoint but poorly understood from an optimization perspective. In this paper we show how to … Read more