Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations

Machine learning (ML) problems are often posed as highly nonlinear and nonconvex unconstrained optimization problems. Methods for solving ML problems based on stochastic gradient descent are easily scaled for very large problems but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update typically do not require manually tuning hyper-parameters but … Read more

Underestimate Sequences via Quadratic Averaging

In this work we introduce the concept of an Underestimate Sequence (UES), which is a natural extension of Nesterov’s estimate sequence. Our definition of a UES utilizes three sequences, one of which is a lower bound (or under-estimator) of the objective function. The question of how to construct an appropriate sequence of lower bounds is … Read more

Derivative-Free Robust Optimization by Outer Approximations

We develop an algorithm for minimax problems that arise in robust optimization in the absence of objective function derivatives. The algorithm utilizes an extension of methods for inexact outer approximation in sampling a potentially infinite-cardinality uncertainty set. Clarke stationarity of the algorithm output is established alongside desirable features of the model-based trust-region subproblems encountered. We … Read more

A Dense initialization for limited-memory quasi-Newton methods

We consider a family of dense initializations for limited-memory quasi-Newton methods. The proposed initialization exploits an eigendecomposition-based separation of the full space into two complementary subspaces, assigning a different initialization parameter to each subspace. This family of dense initializations is proposed in the context of a limited-memory Broyden- Fletcher-Goldfarb-Shanno (L-BFGS) trust-region method that makes use … Read more

Manifold Sampling for Optimization of Nonconvex Functions that are Piecewise Linear Compositions of Smooth Components

We develop a manifold sampling algorithm for the minimization of a nonsmooth composite function $f \defined \psi + h \circ F$ when $\psi$ is smooth with known derivatives, $h$ is a known, nonsmooth, piecewise linear function, and $F$ is smooth but expensive to evaluate. The trust-region algorithm classifies points in the domain of $h$ as … Read more

An Inexact Regularized Newton Framework with a Worst-Case Iteration Complexity of $\mathcal{O}(\epsilon^{-3/2})$ for Nonconvex Optimization

An algorithm for solving smooth nonconvex optimization problems is proposed that, in the worst-case, takes $\mathcal{O}(\epsilon^{-3/2})$ iterations to drive the norm of the gradient of the objective function below a prescribed positive real number $\epsilon$ and can take $\mathcal{O}(\epsilon^{-3})$ iterations to drive the leftmost eigenvalue of the Hessian of the objective above $-\epsilon$. The proposed … Read more

Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization

We establish or refute the optimality of inexact second-order methods for unconstrained nonconvex optimization from the point of view of worst-case evaluation complexity, improving and generalizing the results of Cartis, Gould and Toint (2010,2011). To this aim, we consider a new general class of inexact second-order algorithms for unconstrained optimization that includes regularization and trust-region … Read more

On the use of the saddle formulation in weakly-constrained 4D-VAR data assimilation

This paper discusses the practical use of the saddle variational formulation for the weakly-constrained 4D-VAR method in data assimilation. It is shown that the method, in its original form, may produce erratic results or diverge because of the inherent lack of monotonicity of the produced objective function values. Convergent, variationaly coherent variants of the algorithm … Read more

On types of degenerate critical points of real polynomial functions

In this paper, we consider the problem of identifying the type (local minimizer, maximizer or saddle point) of a given isolated real critical point $c$, which is degenerate, of a multivariate polynomial function $f$. To this end, we introduce the definition of faithful radius of $c$ by means of the curve of tangency of $f$. … Read more

Gradient Descent using Duality Structures

Gradient descent is commonly used to solve optimization problems arising in machine learning, such as training neural networks. Although it seems to be effective for many different neural network training problems, it is unclear if the effectiveness of gradient descent can be explained using existing performance guarantees for the algorithm. We argue that existing analyses … Read more