Accelerated gradient methods on the Grassmann and Stiefel manifolds

In this paper we extend a nonconvex version of Nesterov’s accelerated gradient (AG) method to optimization over the Grassmann and Stiefel manifolds. We propose an exponential-based AG algorithm for the Grassmann manifold and a retraction-based AG algorithm that exploits the Cayley transform for both of the Grassmann and Stiefel manifolds. Under some mild assumptions, we … Read more

Nonlinear conjugate gradient for smooth convex functions

The method of nonlinear conjugate gradients (NCG) is widely used in practice for unconstrained optimization, but it satisfies weak complexity bounds at best when applied to smooth convex functions. In contrast, Nesterov’s accelerated gradient (AG) method is optimal up to constant factors for this class. However, when specialized to quadratic function, conjugate gradient is optimal … Read more

A single potential governing convergence of conjugate gradient, accelerated gradient and geometric descent

Nesterov’s accelerated gradient (AG) method for minimizing a smooth strongly convex function $f$ is known to reduce $f({\bf x}_k)-f({\bf x}^*)$ by a factor of $\epsilon\in(0,1)$ after $k=O(\sqrt{L/\ell}\log(1/\epsilon))$ iterations, where $\ell,L$ are the two parameters of smooth strong convexity. Furthermore, it is known that this is the best possible complexity in the function-gradient oracle model of … Read more

CONVERGENCE RATE OF GRADIENT BASED ADAPTIVE RESTART FOR ACCELERATED GRADIENT SCHEMES

The accelerated gradient algorithm is known to have non-monotonic, periodic convergence behavior in the high momentum regime. If important function parameters like the condition number are known, the momentum can be adjusted to get linear convergence. Unfortunately these parameters are usually not accessible, so instead heuristics are used for deciding when to restart. One of … Read more

A unified convergence bound for conjugate gradient and accelerated gradient

Nesterov’s accelerated gradient method for minimizing a smooth strongly convex function $f$ is known to reduce $f(\x_k)-f(\x^*)$ by a factor of $\eps\in(0,1)$ after $k\ge O(\sqrt{L/\ell}\log(1/\eps))$ iterations, where $\ell,L$ are the two parameters of smooth strong convexity. Furthermore, it is known that this is the best possible complexity in the function-gradient oracle model of computation. The … Read more

Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming

In this paper, we generalize the well-known Nesterov’s accelerated gradient (AG) method, originally designed for convex smooth optimization, to solve nonconvex and possibly stochastic optimization problems. We demonstrate that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general nonconvex smooth optimization problems by using first-order … Read more

Incremental Accelerated Gradient Methods for SVM Classification: Study of the Constrained Approach

We investigate constrained first order techniques for training Support Vector Machines (SVM) for online classification tasks. The methods exploit the structure of the SVM training problem and combine ideas of incremental gradient technique, gradient acceleration and successive simple calculations of Lagrange multipliers. Both primal and dual formulations are studied and compared. Experiments show that the … Read more

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

This paper presents an accelerated variant of the hybrid proximal extragradient (HPE) method for convex optimization, referred to as the accelerated HPE (A-HPE) method. Iteration-complexity results are established for the A-HPE method, as well as a special version of it, where a large stepsize condition is imposed. Two specific implementations of the A-HPE method are … Read more