Block Coordinate Descent Almost Surely Converges to a Stationary Point Satisfying the Second-order Necessary Condition

Given a non-convex twice continuously differentiable cost function with Lipschitz continuous gradient, we prove that all of the block coordinate gradient descent, block mirror descent and proximal block coordinate descent methods converge to stationary points satisfying the second-order necessary condition, almost surely with random initialization. All our results are ascribed to the center-stable manifold theorem … Read more

Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers

Many optimization algorithms converge to stationary points. When the underlying problem is nonconvex, they may get trapped at local minimizers and occasionally stagnate near saddle points. We propose the Run-and-Inspect Method, which adds an “inspect” phase to existing algorithms that helps escape from non-global stationary points. The inspection samples a set of points in a … Read more

On the use of third-order models with fourth-order regularization for unconstrained optimization

In a recent paper, it was shown that, for the smooth unconstrained optimization problem, worst-case evaluation complexity $O(\epsilon^{-(p+1)/p})$ may be obtained by means of algorithms that employ sequential approximate minimizations of p-th order Taylor models plus (p + 1)-th order regularization terms. The aforementioned result, which assumes Lipschitz continuity of the p-th partial derivatives, generalizes … Read more

Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations

Machine learning (ML) problems are often posed as highly nonlinear and nonconvex unconstrained optimization problems. Methods for solving ML problems based on stochastic gradient descent are easily scaled for very large problems but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update typically do not require manually tuning hyper-parameters but … Read more

Underestimate Sequences via Quadratic Averaging

In this work we introduce the concept of an Underestimate Sequence (UES), which is a natural extension of Nesterov’s estimate sequence. Our definition of a UES utilizes three sequences, one of which is a lower bound (or under-estimator) of the objective function. The question of how to construct an appropriate sequence of lower bounds is … Read more

Derivative-Free Robust Optimization by Outer Approximations

We develop an algorithm for minimax problems that arise in robust optimization in the absence of objective function derivatives. The algorithm utilizes an extension of methods for inexact outer approximation in sampling a potentially infinite-cardinality uncertainty set. Clarke stationarity of the algorithm output is established alongside desirable features of the model-based trust-region subproblems encountered. We … Read more

A Dense initialization for limited-memory quasi-Newton methods

We consider a family of dense initializations for limited-memory quasi-Newton methods. The proposed initialization exploits an eigendecomposition-based separation of the full space into two complementary subspaces, assigning a different initialization parameter to each subspace. This family of dense initializations is proposed in the context of a limited-memory Broyden- Fletcher-Goldfarb-Shanno (L-BFGS) trust-region method that makes use … Read more

Manifold Sampling for Optimization of Nonconvex Functions that are Piecewise Linear Compositions of Smooth Components

We develop a manifold sampling algorithm for the minimization of a nonsmooth composite function $f \defined \psi + h \circ F$ when $\psi$ is smooth with known derivatives, $h$ is a known, nonsmooth, piecewise linear function, and $F$ is smooth but expensive to evaluate. The trust-region algorithm classifies points in the domain of $h$ as … Read more

An Inexact Regularized Newton Framework with a Worst-Case Iteration Complexity of $\mathcal{O}(\epsilon^{-3/2})$ for Nonconvex Optimization

An algorithm for solving smooth nonconvex optimization problems is proposed that, in the worst-case, takes $\mathcal{O}(\epsilon^{-3/2})$ iterations to drive the norm of the gradient of the objective function below a prescribed positive real number $\epsilon$ and can take $\mathcal{O}(\epsilon^{-3})$ iterations to drive the leftmost eigenvalue of the Hessian of the objective above $-\epsilon$. The proposed … Read more

Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization

We establish or refute the optimality of inexact second-order methods for unconstrained nonconvex optimization from the point of view of worst-case evaluation complexity, improving and generalizing the results of Cartis, Gould and Toint (2010,2011). To this aim, we consider a new general class of inexact second-order algorithms for unconstrained optimization that includes regularization and trust-region … Read more