Hao-Jun Shi – Optimization Online

Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization

Published: 2021/10/14

A common approach for minimizing a smooth nonlinear function is to employ finite-difference approximations to the gradient. While this can be easily performed when no error is present within the function evaluations, when the function is noisy, the optimal choice requires information about the noise level and higher-order derivatives of the function, which is often … Read more

On the Numerical Performance of Derivative-Free Optimization Methods Based on Finite-Difference Approximations

Published: 2021/02/25

Nonlinear Optimization derivative-free optimization, finite differences, noisy optimization, nonlinear optimization, zeroth-order optimization

The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These … Read more

A Noise-Tolerant Quasi-Newton Method for Unconstrained Optimization

Published: 2020/10/08, Updated: 2021/09/08

Unconstrained Optimization bfgs method, derivative-free optimization, noisy optimization, stochastic optimization

This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or are subject to statistical noise. The classical BFGS and L-BFGS methods can fail in such circumstances because the updating procedure … Read more

A Progressive Batching L-BFGS Method for Machine Learning

Published: 2018/02/14

Convex Optimization, Nonlinear Optimization, Stochastic Programming deep learning, nonconvex optimization, sample selection, stochastic optimization

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise … Read more