A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muon

A unified framework for first-order optimization algorithms for nonconvex unconstrained optimization is proposed that uses adaptively preconditioned gradients and includes popular methods such as full and diagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo and Muon. This framework also allows combining heterogeneous geometries across different groups of variables while preserving a unified convergence … Read more

An objective-function-free algorithm for nonconvex stochastic optimization with deterministic equality and inequality constraints

An algorithm is proposed for solving optimization problems with stochastic objective and deterministic equality and inequality constraints. This algorithm is objective-function-free in the sense that it only uses the objective’s gradient and never evaluates the function value. It is based on an adaptive selection of function-decreasing and constraint-improving iterations, the first ones using an Adagrad-type … Read more

An objective-function-free algorithm for general smooth constrained optimization

A new algorithm for smooth constrained optimization is proposed that never computes the value of the problem’s objective function and that handles both equality and inequality constraints. The algorithm uses an adaptive switching strategy between a normal step aiming at reducing constraint’s infeasibility and a tangential step improving dual optimality, the latter being inspired by … Read more

Iteration complexity of the Difference-of-Convex Algorithm for unconstrained optimization: a simple proof

We propose a simple proof of the worst-case iteration complexity for the Difference of Convex functions Algorithm (DCA) for unconstrained minimization, showing that the global rate of convergence of the norm of the objective function’s gradients at the iterates converge to zero like $o(1/k)$. A small example is also provided indicating that the rate cannot … Read more

A Simple First-Order Algorithm for Full-Rank Equality Constrained Optimization

A very simple first-order algorithm is proposed for solving nonlinear optimization problems with deterministic nonlinear equality constraints. This algorithm adaptively selects steps in the plane tangent to the constraints or steps that reduce infeasibility, without using a merit function or a filter. The tangent steps are based on the AdaGrad method for unconstrained minimization. The … Read more

Recursive Bound-Constrained AdaGrad with Applications to Multilevel and Domain Decomposition Minimization

Two OFFO (Objective-Function Free Optimization) noise tolerant algorithms are presented that handle bound constraints, inexact gradients and use second-order information when available. The first is a multi-level method exploiting a hierarchical description of the problem and the second is a domain-decomposition method covering the standard addditive Schwarz decompositions. Both are generalizations of the first-order AdaGrad … Read more

Fast Stochastic Second-Order Adagrad for Nonconvex Bound-Constrained Optimization

ADAGB2, a generalization of the Adagrad algorithm for stochastic optimization is introduced, which is also applicable to bound-constrained problems and capable of using second-order information when available. It is shown that, given  delta in (0,1) and epsilon in (0,1], the ADAGB2 algorithm needs at most O(epsilon^{-2}) iterations to ensure an epsilon-approximate first-order critical point of … Read more

A Fast Newton Method Under Local Lipschitz Smoothness

A new, fast second-order method is proposed that achieves the optimal \(\mathcal{O}\left(|\log(\epsilon)|\epsilon^{-3/2}\right) \) complexity to obtain first-order $\epsilon$-stationary points. Crucially, this is deduced without assuming the standard global Lipschitz Hessian continuity condition, but onlyusing an appropriate local smoothness requirement. The algorithm exploits Hessian information to compute a Newton step and a negative curvature step when … Read more

prunAdag: an adaptive pruning-aware gradient method

A pruning-aware adaptive gradient method is proposed which classifies the variables in two sets before updating them using different strategies. This technique extends the “relevant/irrelevant” approach of Ding (2019) and Zimmer et al. (2022) and allows a posteriori sparsification of the solution of model parameter fitting problems. The new method is proved to be convergent … Read more

Examples of slow convergence for adaptive regularization optimization methods are not isolated

The adaptive regularization algorithm for unconstrained nonconvex optimization was shown in Nesterov and Polyak (2006) and Cartis, Gould and Toint (2011) to  require, under standard assumptions, at most O(\epsilon^{3/(3-q)}) evaluations of the objective function and its derivatives of degrees one and two to produce an \epsilon-approximate critical point of order q in {1,2}. This bound … Read more