A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muon

A unified framework for first-order optimization algorithms for nonconvex unconstrained optimization is proposed that uses adaptively preconditioned gradients and includes popular methods such as full and diagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo and Muon. This framework also allows combining heterogeneous geometries across different groups of variables while preserving a unified convergence … Read more

A flexible block coordinate descent method for unconstrained optimization under Hölder continuity

In this work, we propose a flexible block coordinate method for unconstrained optimization problems under Hölder continuity assumptions. The method guarantees convergence to stationary points and has worst-case complexity results comparable to those obtained by single-block methods that assume Lipschitz or Hölder continuity. The approach is based on quadratic models of the objective function combined … Read more

Adaptive Newton-CG methods with global and local analysis for unconstrained optimization with Hölder continuous Hessian

In this paper, we study Newton-conjugate gradient (Newton-CG) methods for minimizing a nonconvex function $f$ whose Hessian is $(H_f,\nu)$-H\”older continuous with modulus $H_f>0$ and exponent \(\nu\in(0,1]\). Recently proposed Newton-CG methods for this problem \cite{he2025newton} adopt (i) non-adaptive regularization and (ii) a nested line-search procedure, where (i) often leads to inefficient early progress and the loss … Read more

A Multi-Secant Limited-Memory BFGS Method

We develop multi-secant BFGS-like quasi-Newton updating scheme, which adaptively selects the number of imposed secant conditions and naturally preserves positivity of approximated Hessian. Compact representation and respective limited-memory formulation are also derived. Numerical stability is assured via unconventional damping technique, which symmetrically handles coordinate and gradient differences. Practical relevance of proposed method is demonstrated via … Read more

Negative Curvature Methods with High-Probability Complexity Guarantees for Stochastic Nonconvex Optimization

This paper develops negative curvature methods for continuous nonlinear unconstrained optimization in stochastic settings, in which function, gradient, and Hessian information is available only through probabilistic oracles, i.e., oracles that return approximations of a certain accuracy and reliability. We introduce conditions on these oracles and design a two-step framework that systematically combines gradient and negative … Read more

Objective-Function Free Multi-Objective Optimization: Rate of Convergence and Performance of an Adagrad-like algorithm

We propose an Adagrad-like algorithm for multi-objective unconstrained optimization that relies on the computation of a common descent direction only. Unlike classical local algorithms for multi-objective optimization, our approach does not rely on the dominance property to accept new iterates, which allows for a flexible and function-free optimization framework. New points are obtained using an … Read more

An adaptive line-search-free multiobjective gradient method and its iteration-complexity analysis

This work introduces an Adaptive Line-Search-Free Multiobjective Gradient (AMG) method for solving smooth multiobjective optimization problems. The proposed approach automatically adjusts stepsizes based on steepest descent directions, promoting robustness with respect to stepsize choice while maintaining low computational cost. The method is specifically tailored to the multiobjective setting and does not rely on function evaluations, … Read more

On Approximate Computation of Critical Points

We show that computing even very coarse approximations of critical points is intractable for simple classes of nonconvex functions. More concretely, we prove that if there exists a polynomial-time algorithm that takes as input a polynomial in \(n\) variables of constant degree (as low as three) and outputs a point whose gradient has Euclidean norm … Read more

Iteration complexity of the Difference-of-Convex Algorithm for unconstrained optimization: a simple proof

We propose a simple proof of the worst-case iteration complexity for the Difference of Convex functions Algorithm (DCA) for unconstrained minimization, showing that the global rate of convergence of the norm of the objective function’s gradients at the iterates converge to zero like $o(1/k)$. A small example is also provided indicating that the rate cannot … Read more

A Majorization-Minimization approach for multiclass classification in a big data scenario

This work presents a novel optimization approach for training linear classifiers in multiclass classification tasks, when focusing on a regularized and smooth Weston-Watkins support vector machine (SVM) model. We propose a Majorization-Minimization (MM) algorithm to solve the resulting, Lipschitz-differentiable, optimization problem. To enhance scalability of the algorithm when tackling large datasets, we introduce an incremental … Read more