Quasi-Newton Methods for Deep Learning: Forget the Past, Just Sample

We present two sampled quasi-Newton methods: sampled LBFGS and sampled LSR1. Contrary to the classical variants of these methods that sequentially build (inverse) Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate to produce these approximations. As a result, the approximations constructed make use of more reliable (recent … Read more

Analysis of the BFGS Method with Errors

The classical convergence analysis of quasi-Newton methods assumes that the function and gradients employed at each iteration are exact. In this paper, we consider the case when there are (bounded) errors in both computations and establish conditions under which a slight modification of the BFGS algorithm with an Armijo-Wolfe line search converges to a neighborhood … Read more

Local minimizers of semi-algebraic functions

Consider a semi-algebraic function $f\colon\mathbb{R}^n \to {\mathbb{R}},$ which is continuous around a point $\bar{x} \in \mathbb{R}^n.$ Using the so–called {\em tangency variety} of $f$ at $\bar{x},$ we first provide necessary and sufficient conditions for $\bar{x}$ to be a local minimizer of $f,$ and then in the case where $\bar{x}$ is an isolated local minimizer of … Read more

A note on solving nonlinear optimization problems in variable precision

This short note considers an efficient variant of the trust-region algorithm with dynamic accuracy proposed Carter (1993) and Conn, Gould and Toint (2000) as a tool for very high-performance computing, an area where it is critical to allow multi-precision computations for keeping the energy dissipation under control. Numerical experiments are presented indicating that the use … Read more

Adaptive regularization algorithms with inexact evaluations for nonconvex optimization

A regularization algorithm using inexact function values and inexact derivatives is proposed and its evaluation complexity analyzed. This algorithm is applicable to unconstrained problems and to problems with inexpensive constraints (that is constraints whose evaluation and enforcement has negligible cost) under the assumption that the derivative of highest degree is beta-H\”{o}lder continuous. It features a … Read more

On limited-memory quasi-Newton methods for minimizing a quadratic function

The main focus in this paper is exact linesearch methods for minimizing a quadratic function whose Hessian is positive definite. We give two classes of limited-memory quasi-Newton Hessian approximations that generate search directions parallel to those of the method of preconditioned conjugate gradients, and hence give finite termination on quadratic optimization problems. The Hessian approximations … Read more

Understanding the Acceleration Phenomenon via High-Resolution Differential Equations

Gradient-based optimization algorithms can be studied from the perspective of limiting or- dinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distin- guish between two fundamentally different algorithms—Nesterov’s accelerated gradient method for strongly convex functions (NAG-SC) and Polyak’s heavy-ball method—we study an alter- native limiting process that yields high-resolution ODEs. We … Read more

A Subsampling Line-Search Method with Second-Order Results

In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic algorithms that sample problem data, which can jeopardize the guarantees obtained through classical globalization techniques in optimization such as a trust … Read more

Nonmonotone line searches for unconstrained multiobjective optimization problems

In the last two decades, many descent methods for multiobjective optimization problems were proposed. In particular, the steepest descent and the Newton methods were studied for the unconstrained case. In both methods, the search directions are computed by solving convex subproblems, and the stepsizes are obtained by an Armijo-type line search. As a consequence, the … Read more

A Partial PPA block-wise ADMM for Multi-Block Constrained Separable Convex Optimization

The alternating direction method of multipliers(ADMM) has been proved to be effective for solving two-block separable convex optimization subject to linear constraints. However, it is not necessarily convergent when it is extended to multiple-block case directly. One remedy could be regrouping multiple-block variables into two groups firstly and then adopting the classic ADMM to the … Read more