Examples of slow convergence for adaptive regularization optimization methods are not isolated

The adaptive regularization algorithm for unconstrained nonconvex optimization was shown in Nesterov and Polyak (2006) and Cartis, Gould and Toint (2011) to  require, under standard assumptions, at most O(\epsilon^{3/(3-q)}) evaluations of the objective function and its derivatives of degrees one and two to produce an \epsilon-approximate critical point of order q in {1,2}. This bound … Read more

Refining asymptotic complexity bounds for nonconvex optimization methods, including why steepest descent is o(eps^{-2}) rather than O(eps^{-2})

\(\) We revisit the standard “telescoping sum” argument ubiquitous in the final steps of analyzing evaluation complexity of algorithms for smooth nonconvex optimization, and obtain a refined formulation of the resulting bound as a function of the requested accuracy eps. While bounds obtained using the standard argument typically are of the form \(O(\epsilon^{-\alpha})\) for some … Read more

A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

\(\) A fully stochastic second-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal $\mathcal{O}(\epsilon^{-3/2})$ complexity bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps. Applications to cases where derivative … Read more

S2MPJ and CUTEst optimization problems for Matlab, Python and Julia

A new decoder for the SIF test problems of the \cutest\ collection is described, which produces problem files allowing the computation of values and derivatives of the objective function and constraints of most \cutest\ problems directly within “native” Matlab, Python or Julia, without any additional installation or interfacing with MEX files or Fortran programs. When … Read more

Complexity of Adagrad and other first-order methods for nonconvex optimization problems with bounds constraints

A parametric class of trust-region algorithms for constrained nonconvex optimization is analyzed, where the objective function is never computed. By defining appropriate first-order stationarity criteria, we are able to extend the Adagrad method to the newly considered problem and retrieve the standard complexity rate of the projected gradient method that uses both the gradient and … Read more

An optimally fast objective-function-free minimization algorithm using random subspaces

Article Download View An optimally fast objective-function-free minimization algorithm using random subspaces

Yet another fast variant of Newton’s method for nonconvex optimization

\(\) A second-order algorithm is proposed for minimizing smooth nonconvex functions that alternates between regularized Newton and negative curvature steps. In most cases, the Hessian matrix is regularized with the square root of the current gradient and an additional term taking moderate negative curvature into account, a negative curvature step being taken only exceptionnally. As … Read more

Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training

A class of multi-level algorithms for unconstrained nonlinear optimization is presented which does not require the evaluation of the objective function. The class contains the momentum-less AdaGrad method as a particular (single-level) instance. The choice of avoiding the evaluation of the objective function is intended to make the algorithms of the class less sensitive to … Read more

Convergence properties of an Objective-Function-Free Optimization regularization algorithm, including an $\mathcal{O}(\epsilon^{-3/2})$ complexity bound

An adaptive regularization algorithm for unconstrained nonconvex optimization is presented in which the objective function is never evaluated, but only derivatives are used. This algorithm belongs to the class of adaptive regularization methods, for which optimal worst-case complexity results are known for the standard framework where the objective function is evaluated. It is shown in … Read more

OFFO minimization algorithms for second-order optimality and their complexity

An Adagrad-inspired class of algorithms for smooth unconstrained optimization is presented in which the objective function is never evaluated and yet the gradient norms decrease at least as fast as O(1/\sqrt{k+1}) while second-order optimality measures converge to zero at least as fast as O(1/(k+1)^{1/3}). This latter rate of convergence is shown to be essentially sharp … Read more