Computational complexity of decomposing a symmetric matrix as a sum of positive semidefinite and diagonal matrices

We study several variants of decomposing a symmetric matrix into a sum of a low-rank positive semidefinite matrix and a diagonal matrix. Such decompositions have applications in factor analysis and they have been studied for many decades. On the one hand, we prove that when the rank of the positive semidefinite matrix in the decomposition … Read more

Nonlinear conjugate gradient for smooth convex functions

The method of nonlinear conjugate gradients (NCG) is widely used in practice for unconstrained optimization, but it satisfies weak complexity bounds at best when applied to smooth convex functions. In contrast, Nesterov’s accelerated gradient (AG) method is optimal up to constant factors for this class. However, when specialized to quadratic function, conjugate gradient is optimal … Read more

On identifying clusters from sum-of-norms clustering computation

Sum-of-norms clustering is a clustering formulation based on convex optimization that automatically induces hierarchy. Multiple algorithms have been proposed to solve the optimization problem: subgradient descent by Hocking et al.\ \cite{hocking}, ADMM and ADA by Chi and Lange\ \cite{Chi}, stochastic incremental algorithm by Panahi et al.\ \cite{Panahi} and semismooth Newton-CG augmented Lagrangian method by Yuan … Read more

Provable Overlapping Community Detection in Weighted Graphs

Community detection is a widely-studied unsupervised learning problem in which the task is to group similar entities together based on observed pairwise entity interactions. This problem has applications in diverse domains such is social network analysis and computational biology. There is a significant amount of literature studying this problem under the assumption that the communities … Read more

A termination criterion for stochastic gradient descent for binary classification

We propose a new, simple, and computationally inexpensive termination test for constant step-size stochastic gradient descent (SGD) applied to binary classification on the logistic and hinge loss with homogeneous linear predictors. Our theoretical results support the effectiveness of our stopping criterion when the data is Gaussian distributed. This presence of noise allows for the possibility … Read more

Potential-based analyses of first-order methods for constrained and composite optimization

We propose potential-based analyses for first-order algorithms applied to constrained and composite minimization problems. We first propose “idealized” frameworks for algorithms in the strongly and non-strongly convex cases and argue based on a potential that methods following the framework achieve the best possible rate. Then we show that the geometric descent (GD) algorithm by Bubeck … Read more

Recovery of a mixture of Gaussians by sum-of-norms clustering

Sum-of-norms clustering is a method for assigning $n$ points in $\R^d$ to $K$ clusters, $1\le K\le n$, using convex optimization. Recently, Panahi et al.\ proved that sum-of-norms clustering is guaranteed to recover a mixture of Gaussians under the restriction that the number of samples is not too large. The purpose of this note is to … Read more

A single potential governing convergence of conjugate gradient, accelerated gradient and geometric descent

Nesterov’s accelerated gradient (AG) method for minimizing a smooth strongly convex function $f$ is known to reduce $f({\bf x}_k)-f({\bf x}^*)$ by a factor of $\epsilon\in(0,1)$ after $k=O(\sqrt{L/\ell}\log(1/\epsilon))$ iterations, where $\ell,L$ are the two parameters of smooth strong convexity. Furthermore, it is known that this is the best possible complexity in the function-gradient oracle model of … Read more

A unified convergence bound for conjugate gradient and accelerated gradient

Nesterov’s accelerated gradient method for minimizing a smooth strongly convex function $f$ is known to reduce $f(\x_k)-f(\x^*)$ by a factor of $\eps\in(0,1)$ after $k\ge O(\sqrt{L/\ell}\log(1/\eps))$ iterations, where $\ell,L$ are the two parameters of smooth strong convexity. Furthermore, it is known that this is the best possible complexity in the function-gradient oracle model of computation. The … Read more

Finding the largest low-rank clusters with Ky Fan 2-k-norm and l1-norm

We propose a convex optimization formulation with the Ky Fan 2-k-norm and l1-norm to fi nd k largest approximately rank-one submatrix blocks of a given nonnegative matrix that has low-rank block diagonal structure with noise. We analyze low-rank and sparsity structures of the optimal solutions using properties of these two matrix norms. We show that, under … Read more