Training Structured Neural Networks Through Manifold Identification and Variance Reduction

This paper proposes an algorithm, RMDA, for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we … Read more

Accelerating Inexact Successive Quadratic Approximation for Regularized Optimization Through Manifold Identification

For regularized optimization that minimizes the sum of a smooth term and a regularizer that promotes structured solutions, inexact proximal-Newton-type methods, or successive quadratic approximation (SQA) methods, are widely used for their superlinear convergence in terms of iterations. However, unlike the counter parts in smooth optimization, they suffer from lengthy running time in solving regularized … Read more

Manifold Identification for Ultimately Communication-Efficient Distributed Optimization

This work proposes a progressive manifold identification approach for distributed optimization with sound theoretical justifications to greatly reduce both the rounds of communication and the bytes communicated per round for partly-smooth regularized problems such as the $\ell_1$- and group-LASSO-regularized ones. Our two-stage method first uses an inexact proximal quasi-Newton method to iteratively identify a sequence … Read more

Active Set Complexity of the Away-step Frank-Wolfe Algorithm

In this paper, we study active set identification results for the away-step Frank-Wolfe algorithm in different settings. We first prove a local identification property that we apply, in combination with a convergence hypothesis, to get an active set identification result. We then prove, in the nonconvex case, a novel O(1/ √k) convergence rate result and … Read more

First-order methods for the impatient: support identification in finite time with convergent Frank-Wolfe variants

In this paper, we focus on the problem of minimizing a non-convex function over the unit simplex. We analyze two well-known and widely used variants of the Frank-Wolfe algorithm and first prove global convergence of the iterates to stationary points both when using exact and Armijo line search. Then we show that the algorithms identify … Read more

Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Iterative methods that calculate their steps from approximate subgradient directions have proved to be useful for stochastic learning problems over large and streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, the solution often lies on a low-dimensional manifold of parameter space along which the regularizer is smooth. … Read more