gradient descent – Optimization Online

On Averaging and Extrapolation for Gradient Descent

Published: 2024/02/19, Updated: 2026/02/11

Convex Optimization, Semi-definite Programming averaging, extrapolation, gradient descent, performance estimation problems, smooth convex optimization

This work considers the effect of averaging, and more generally extrapolation, of the iterates of gradient descent in smooth convex optimization. After running the method, rather than reporting the final iterate, one can report either a convex combination of the iterates (averaging) or a generic combination of the iterates (extrapolation). For several common stepsize sequences, … Read more

Accelerated Gradient Descent via Long Steps

Published: 2023/09/23, Updated: 2023/10/01

Benjamin Grimmer

Kevin Shu

Alex L. Wang

Convex and Nonsmooth Optimization, Convex Optimization, Unconstrained Optimization acceleration, Convergence Guarantees, convex, gradient descent, performance estimation, smooth

Recently Grimmer [1] showed for smooth convex optimization by utilizing longer steps periodically, gradient descent’s state-of-the-art O(1/T) convergence guarantees can be improved by constant factors, conjecturing an accelerated rate strictly faster than O(1/T) could be possible. Here we prove such a big-O gain, establishing gradient descent’s first accelerated convergence rate in this setting. Namely, we … Read more

Provably Faster Gradient Descent via Long Steps

Published: 2023/07/11, Updated: 2024/02/04

Benjamin Grimmer

Convex Optimization, Semi-definite Programming acceleration, computer-assisted, convergence rates, convex, gradient descent, PEP, smooth

This work establishes provably faster convergence rates for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We … Read more

Preconditioned Gradient Descent for Overparameterized Nonconvex Burer–Monteiro Factorization with Global Optimality Certification

Published: 2022/06/06

Gavin Zhang

Salar Fattahi

Richard Y. Zhang

Nonlinear Optimization gradient descent, low-rank matrix factorization, nonconvex optimization

We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally \emph{rank deficient}, then its … Read more

Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization

Published: 2021/11/30, Updated: 2022/09/27

X.Y. Han

Adrian Lewis

Convex Optimization, Nonsmooth Optimization active sets, convex optimization, gradient descent, linear convergence, max functions, minimax optimization, multipoint method, nonsmooth optimization, optimal methods, survey descent

For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even when the objective is smooth at every iterate, the corresponding local models are unstable and the number of cutting planes invoked by traditional remedies is … Read more

Adaptive Gradient Descent without Descent

Published: 2019/10/29, Updated: 2019/10/31

Yura Malitsky

Konstantin Mishchenko

Convex Optimization adaptivity, gradient descent, linesearch, stepsize

We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don’t increase the stepsize too fast and 2) don’t overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive … Read more

Complexity of gradient descent for multiobjective optimization

Published: 2018/04/17

Jörg Fliege

A. Ismael F. Vaz

Luis Nunes Vicente

Convex and Nonsmooth Optimization, Nonlinear Optimization global rates, gradient descent, multiobjective optimization, steepest descent, worst-case complexity

A number of first-order methods have been proposed for smooth multiobjective optimization for which some form of convergence to first order criticality has been proved. Such convergence is global in the sense of being independent of the starting point. In this paper we analyze the rate of convergence of gradient descent for smooth unconstrained multiobjective … Read more

New analysis of linear convergence of gradient-type methods via unifying error bound conditions

Published: 2016/08/17, Updated: 2017/06/23

Hui Zhang

Convex and Nonsmooth Optimization, Nonlinear Optimization cyclic block coordinate gradient descent, dual gradient algorithm, error-bound condition, gradient descent, linear convergence, nesterov's acceleration, proximal point algorithm

The subject of linear convergence of gradient-type methods on non-strongly convex optimization has been widely studied by introducing several notions as sufficient conditions. Influential examples include the error bound property, the restricted strongly convex property, the quadratic growth property, and the Kurdyka-{\L}ojasiewicz property. In this paper, we first define a group of error bound conditions … Read more

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Lojasiewicz Condition

Published: 2016/08/16, Updated: 2020/09/12

Hamed Karimi

Mark Schmidt

Julie Nutini

Convex and Nonsmooth Optimization boosting, coordinate descent, gradient descent, l1-regularization, least squares, logistic regression, stochastic gradient, support vector machines, variance reduction

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-Lojasiewicz (PL) … Read more

Gradient Descent only Converges to Minimizers

Published: 2016/02/17

Unconstrained Optimization gradient descent, local minimizer, nonconvex optimization, saddle point problem

We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory. ArticleDownload View PDF