A Progressive Batching L-BFGS Method for Machine Learning

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise … Read more

Regional Complexity Analysis of Algorithms for Nonconvex Smooth Optimization

A strategy is proposed for characterizing the worst-case performance of algorithms for solving nonconvex smooth optimization problems. Contemporary analyses characterize worst-case performance by providing, under certain assumptions on an objective function, an upper bound on the number of iterations (or function or derivative evaluations) required until a pth-order stationarity condition is approximately satisfied. This arguably … Read more

Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs

This paper analyzes the iteration-complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. More specifically, the objective function is of the form f + h where f is a differentiable function whose gradient is Lipschitz continuous and h is a closed convex function with a bounded domain. … Read more

Block Coordinate Descent Almost Surely Converges to a Stationary Point Satisfying the Second-order Necessary Condition

Given a non-convex twice continuously differentiable cost function with Lipschitz continuous gradient, we prove that all of the block coordinate gradient descent, block mirror descent and proximal block coordinate descent methods converge to stationary points satisfying the second-order necessary condition, almost surely with random initialization. All our results are ascribed to the center-stable manifold theorem … Read more

Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers

Many optimization algorithms converge to stationary points. When the underlying problem is nonconvex, they may get trapped at local minimizers and occasionally stagnate near saddle points. We propose the Run-and-Inspect Method, which adds an “inspect” phase to existing algorithms that helps escape from non-global stationary points. The inspection samples a set of points in a … Read more

Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations

Machine learning (ML) problems are often posed as highly nonlinear and nonconvex unconstrained optimization problems. Methods for solving ML problems based on stochastic gradient descent are easily scaled for very large problems but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update typically do not require manually tuning hyper-parameters but … Read more

Globally Solving the Trust Region Subproblem Using Simple First-Order Methods

We consider the trust region subproblem which is given by a minimization of a quadratic, not necessarily convex, function over the Euclidean ball. Based on the well-known second-order necessary and sufficient optimality conditions for this problem, we present two sufficient optimality conditions defined solely in terms of the primal variables. Each of these conditions corresponds … Read more

An Inexact Regularized Newton Framework with a Worst-Case Iteration Complexity of $\mathcal{O}(\epsilon^{-3/2})$ for Nonconvex Optimization

An algorithm for solving smooth nonconvex optimization problems is proposed that, in the worst-case, takes $\mathcal{O}(\epsilon^{-3/2})$ iterations to drive the norm of the gradient of the objective function below a prescribed positive real number $\epsilon$ and can take $\mathcal{O}(\epsilon^{-3})$ iterations to drive the leftmost eigenvalue of the Hessian of the objective above $-\epsilon$. The proposed … Read more

Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization

We establish or refute the optimality of inexact second-order methods for unconstrained nonconvex optimization from the point of view of worst-case evaluation complexity, improving and generalizing the results of Cartis, Gould and Toint (2010,2011). To this aim, we consider a new general class of inexact second-order algorithms for unconstrained optimization that includes regularization and trust-region … Read more

Behavior of accelerated gradient methods near critical points of nonconvex functions

We examine the behavior of accelerated gradient methods in smooth nonconvex unconstrained optimization, focusing in particular on their behavior near strict saddle points. Accelerated methods are iterative methods that typically step along a direction that is a linear combination of the previous step and the gradient of the function evaluated at a point at or … Read more