Training Structured Neural Networks Through Manifold Identification and Variance Reduction

This paper proposes an algorithm, RMDA, for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we … Read more

A Proximal Quasi-Newton Trust-Region Method for Nonsmooth Regularized Optimization

We develop a trust-region method for minimizing the sum of a smooth term f and a nonsmooth term h, both of which can be nonconvex. Each iteration of our method minimizes apossibly nonconvex model of f+h in a trust region. The model coincides with f+h in value and subdifferential at the center. We establish global … Read more

Accelerating Inexact Successive Quadratic Approximation for Regularized Optimization Through Manifold Identification

For regularized optimization that minimizes the sum of a smooth term and a regularizer that promotes structured solutions, inexact proximal-Newton-type methods, or successive quadratic approximation (SQA) methods, are widely used for their superlinear convergence in terms of iterations. However, unlike the counter parts in smooth optimization, they suffer from lengthy running time in solving regularized … Read more

A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable to both the primal and the dual ERM problem. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or … Read more

A Distributed Quasi-Newton Algorithm for Empirical Risk Minimization with Nonsmooth Regularization

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving ERM problems with a nonsmooth regularization term. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or work only for specific regularizers. Our algorithm uses successive quadratic approximations, and we describe how to … Read more

Inexact Successive Quadratic Approximation for Regularized Optimization

Successive quadratic approximations, or second-order proximal methods, are useful for minimizing functions that are a sum of a smooth part and a convex, possibly nonsmooth part that promotes regularization. Most analyses of iteration complexity focus on the special case of proximal gradient method, or accelerated variants thereof. There have been only a few studies of … Read more

Accelerated Block-Coordinate Relaxation for Regularized Optimization

We discuss minimization of a smooth function to which is added a separable regularization function that induces structure in the solution. A block-coordinate relaxation approach with proximal linearized subproblems yields convergence to critical points, while identification of the optimal manifold (under a nondegeneracy condition) allows acceleration techniques to be applied on a reduced space. The … Read more