Accelerated projected gradient algorithms for sparsity constrained optimization problems

\(\) We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an \(\ell_0\)-norm constraint. Through decomposing the feasible set of the given sparsity constraint as a finite union of linear subspaces, we present two acceleration schemes with global convergence guarantees, one by same-space extrapolation … Read more

Accelerating nuclear-norm regularized low-rank matrix optimization through Burer-Monteiro decomposition

This work proposes a rapid algorithm, BM-Global, for nuclear-norm-regularized convex and low-rank matrix optimization problems. BM-Global efficiently decreases the objective value via low-cost steps leveraging the nonconvex but smooth Burer-Monteiro (BM) decomposition, while effectively escapes saddle points and spurious local minima ubiquitous in the BM form to obtain guarantees of fast convergence rates to the … Read more

Global convergence and acceleration of projection methods for feasibility problems involving union convex sets

We prove global convergence of classical projection algorithms for feasibility problems involving union convex sets, which refer to sets expressible as the union of a finite number of closed convex sets. We present a unified strategy for analyzing global convergence by means of studying fixed-point iterations of a set-valued operator that is the union of … Read more

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

This paper proposes an algorithm, RMDA, for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we … Read more

Accelerating Inexact Successive Quadratic Approximation for Regularized Optimization Through Manifold Identification

For regularized optimization that minimizes the sum of a smooth term and a regularizer that promotes structured solutions, inexact proximal-Newton-type methods, or successive quadratic approximation (SQA) methods, are widely used for their superlinear convergence in terms of iterations. However, unlike the counter parts in smooth optimization, they suffer from lengthy running time in solving regularized … Read more

Limited-memory Common-directions Method for Large-scale Optimization: Convergence, Parallelization, and Distributed Optimization

In this paper, we present a limited-memory common-directions method for smooth optimization that interpolates between first- and second- order methods. At each iteration, a subspace of a limited dimension size is constructed using first-order information from previous iterations, and an ef- ficient Newton method is deployed to find an approximate minimizer within this subspace. With … Read more

Manifold Identification for Ultimately Communication-Efficient Distributed Optimization

This work proposes a progressive manifold identification approach for distributed optimization with sound theoretical justifications to greatly reduce both the rounds of communication and the bytes communicated per round for partly-smooth regularized problems such as the $\ell_1$- and group-LASSO-regularized ones. Our two-stage method first uses an inexact proximal quasi-Newton method to iteratively identify a sequence … Read more

A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable to both the primal and the dual ERM problem. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or … Read more

First-Order Algorithms Converge Faster than (1/k)$ on Convex Problems

It is well known that both gradient descent and stochastic coordinate descent achieve a global convergence rate of $O(1/k)$ in the objective value, when applied to a scheme for minimizing a Lipschitz-continuously differentiable, unconstrained convex function. In this work, we improve this rate to $o(1/k)$. We extend the result to proximal gradient and proximal coordinate … Read more

Inexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization

Block-coordinate descent (BCD) is a popular framework for large-scale regularized optimization problems with block-separable structure. Existing methods have several limitations. They often assume that subproblems can be solved exactly at each iteration, which in practical terms usually restricts the quadratic term in the subproblem to be diagonal, thus losing most of the benefits of higher-order … Read more