dual averaging – Optimization Online

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

Published: 2021/12/04, Updated: 2022/03/15

Nonlinear Optimization, Nonsmooth Optimization deep learning, dual averaging, manifold identification, regularized optimization, variance reduction

This paper proposes an algorithm, RMDA, for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we … Read more

Relatively-Smooth Convex Optimization by First-Order Methods, and Applications

Published: 2016/10/19, Updated: 2017/10/10

Robert M. Freund

Haihao Lu

Yurii Nesterov

Convex and Nonsmooth Optimization, Convex Optimization complexity, d-optimal design, dual averaging, first-order methods, large-scale optimization, primal gradient

The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant L. However, in many settings the differentiable convex function f(.) is not uniformly smooth — for example in D-optimal design where f(x):=-ln det(HXH^T), or even the univariate … Read more

Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Published: 2011/07/18, Updated: 2012/06/01

Sangkyun Lee

Stephen Wright

Convex Optimization, Data-Mining dual averaging, manifold identification, partly smooth manifold, regularization

Iterative methods that calculate their steps from approximate subgradient directions have proved to be useful for stochastic learning problems over large and streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, the solution often lies on a low-dimensional manifold of parameter space along which the regularizer is smooth. … Read more