A class of algorithms for unconstrained nonconvex optimization is considered where the value of the objective function is never computed. The class contains a deterministic version of the first-order Adagrad method typically used for minimization of noisy function, but also allows the use of second-order information when available. The rate of convergence of methods in the class is analyzed and is shown to be identical to that known for first-order optimization methods using both function and gradients values. The result is essentially sharp and improves on previously known complexity bounds (in the stochastic context) by Defossez et al. (2020) and Gratton et al. (2022). A new class of methods is designed, for which a slightly worse and essentially sharp complexity result holds. Limited numerical experiments show that the new methods’ performance may be comparable to that of standard steepest descent, despite using significantly less information, and that this performance is relatively insensitive to noise.
Citation
arXiv:2203.01757