Stochastic convergence of parallel asynchronous adaptive first-order methods

A new class of asynchronous adaptive first-order optimization methods is introduced, comprising asynchronous variants of several popular algorithms. Versions of these methods using momentum and/or inexact normalization are also considered. The convergence of methods in the class on non-convex functions is analyzed in a fully stochastic setting, and is shown to be (up to logarithmic … Read more

A Parameter-Free Restart Scheme with Only a Parallelizable $\log\log(1/\epsilon)$ Overhead

It is well-known that first-order methods can offer accelerated convergence rates in the presence of growth structures. Restarting schemes provide a general tool for such speed-ups. These schemes typically either require unrealistic problem knowledge, incur logarithmic overhead factors in oracle complexity, and/or have a nontrivial initial burn-in phase. We present a parameter-free approach for restarting … Read more

Solving bilevel optimization via sequential minimax optimization

In this paper we propose a sequential minimax optimization (SMO) method for solving a class of constrained bilevel optimization problems in which the lower-level part is a possibly nonsmooth convex optimization problem, while the upper-level part is a possibly nonconvex optimization problem. Specifically, SMO applies a first-order method to solve a sequence of minimax subproblems, … Read more

GFORS: GPU-Accelerated First-Order Method with Randomized Sampling for Binary Integer Programs

We present GFORS, a GPU-accelerated framework for large binary integer programs. It couples a first-order (PDHG-style) routine that guides the search in the continuous relaxation with a randomized, feasibility-aware sampling module that generates batched binary candidates. Both components are designed to run end-to-end on GPUs with minimal CPU–GPU synchronization. The framework establishes near-stationary-point guarantees for … Read more

A Simple First-Order Algorithm for Full-Rank Equality Constrained Optimization

A very simple first-order algorithm is proposed for solving nonlinear optimization problems with deterministic nonlinear equality constraints. This algorithm adaptively selects steps in the plane tangent to the constraints or steps that reduce infeasibility, without using a merit function or a filter. The tangent steps are based on the AdaGrad method for unconstrained minimization. The … Read more

Swapping objectives accelerates Davis-Yin splitting

In this work, we investigate the application of Davis–Yin splitting (DYS) to convex optimization problems and demonstrate that swapping the roles of the two nonsmooth convex functions can result in a faster convergence rate. Such a swap typically yields a different sequence of iterates, but its impact on convergence behavior has been largely understudied or … Read more

An Adaptive and Parameter-Free Nesterov’s Accelerated Gradient Method for Convex Optimization

We propose AdaNAG, an adaptive accelerated gradient method based on Nesterov’s accelerated gradient method. AdaNAG is line-search-free, parameter-free, and achieves the accelerated convergence rates \( f(x_k) – f_\star = \mathcal{O}\left(1/k^2\right) \) and \( \min_{i\in\left\{1,\dots, k\right\}} \|\nabla f(x_i)\|^2 = \mathcal{O}\left(1/k^3\right) \) for \( L \)-smooth convex function \( f \). We provide a Lyapunov analysis for … Read more

Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent

This paper investigates the convergence properties of the hypergradient descent method (HDM), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of HDM using the online learning framework of [Gao24] and apply this analysis to develop new state-of-the-art adaptive gradient methods with empirical and … Read more

Gradient Methods with Online Scaling

We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to scale the gradient at each iteration through an online learning algorithm and provably accelerates gradient-based methods asymptotically. In contrast with previous literature, where convergence is established based on worst-case analysis, our framework provides a strong convergence guarantee … Read more

Efficient parameter-free restarted accelerated gradient methods for convex and strongly convex optimization

This paper develops a new parameter-free restarted method, namely RPF-SFISTA, and a new parameter-free aggressive regularization method, namely A-REG, for solving strongly convex and convex composite optimization problems, respectively. RPF-SFISTA has the major advantage that it requires no knowledge of both the strong convexity parameter of the entire composite objective and the Lipschitz constant of … Read more