adaptive first-order methods – Optimization Online

Boosted Stochastic Frank-Wolfe for Constrained Nonconvex Optimization

Published: 2026/05/27

The boosted Frank-Wolfe algorithm accelerates the classical Frank-Wolfe algorithm by better aligning the update direction with the negative gradient. Its analysis, however, has been limited to deterministic convex problems, with step sizes that require either line search or knowledge of the Lipschitz constant of the gradient. We develop a novel step size strategy that does … Read more

A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muon

Published: 2026/04/15, Updated: 2026/05/01

Serge Gratton

Philippe L. Toint

Data Science Algorithms, Stochastic Programming, Unconstrained Optimization adaptive first-order methods, machine learning, stochastic complexity, Unified convergence theory

A unified framework for first-order optimization algorithms for nonconvex unconstrained optimization is proposed that uses adaptively preconditioned gradients and includes popular methods such as full and diagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo and Muon. This framework also allows combining heterogeneous geometries across different groups of variables while preserving a unified convergence … Read more

New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes

Published: 2025/12/11

Convex and Nonsmooth Optimization, Convex Optimization, Nonlinear Optimization, Nonsmooth Optimization adaptive first-order methods, Lower bound

In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of … Read more

Gradient Methods with Online Scaling Part I. Theoretical Foundations

Published: 2025/05/29, Updated: 2025/09/08

Wenzhi Gao

Convex Optimization, Nonlinear Optimization, Unconstrained Optimization adaptive first-order methods, convex optimization, online convex optimization

This paper establishes the theoretical foundations of the online scaled gradient methods (OSGM), a framework that utilizes online learning to adapt stepsizes and provably accelerate first-order methods. OSGM quantifies the effectiveness of a stepsize by a feedback function motivated from a convergence measure and uses the feedback to adjust the stepsize through an online learning … Read more

prunAdag: an adaptive pruning-aware gradient method

Published: 2025/02/12

Margherita Porcelli

Giovanni Seraghiti

Philippe L. Toint

Data Science Algorithms, Nonlinear Optimization adaptive first-order methods, Complexity theory, Model pruning, objective-function-free optimization (OFFO)

A pruning-aware adaptive gradient method is proposed which classifies the variables in two sets before updating them using different strategies. This technique extends the “relevant/irrelevant” approach of Ding (2019) and Zimmer et al. (2022) and allows a posteriori sparsification of the solution of model parameter fitting problems. The new method is proved to be convergent … Read more