A minibatch stochastic Quasi-Newton method adapted for nonconvex deep learning problems

In this study, we develop a limited memory nonconvex Quasi-Newton (QN) method, tailored to deep learning (DL) applications. Since the stochastic nature of (sampled) function information in minibatch processing can affect the performance of QN methods, three strategies are utilized to overcome this issue. These involve a novel progressive trust-region radius update (suitable for stochastic … Read more

A quasi-Newton method with Wolfe line searches for multiobjective optimization

We propose a BFGS method with Wolfe line searches for unconstrained multiobjective optimization problems. The algorithm is well defined even for general nonconvex problems. Global and R-linear convergence to a Pareto optimal point are established for strongly convex problems. In the local convergence analysis, if the objective functions are locally strongly convex with Lipschitz continuous … Read more

LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums

We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not make any convexity assumptions, but require the terms of the sum to be continuously differentiable and have Lipschitz-continuous gradients. The methods fitting into this framework combine line searches and suitably decaying step lengths. A key issue is a two-step sampling at … Read more

A Proximal Quasi-Newton Trust-Region Method for Nonsmooth Regularized Optimization

We develop a trust-region method for minimizing the sum of a smooth term f and a nonsmooth term h, both of which can be nonconvex. Each iteration of our method minimizes apossibly nonconvex model of f+h in a trust region. The model coincides with f+h in value and subdifferential at the center. We establish global … Read more

A structured modified Newton approach for solving systems of nonlinear equations arising in interior-point methods for quadratic programming

The focus in this work is interior-point methods for quadratic optimization problems with linear inequality constraints where the system of nonlinear equations that arise are solved with Newton-like methods. In particular, the concern is the system of linear equations to be solved at each iteration. Newton systems give high quality solutions but there is an … Read more

Projected-Search Methods for Bound-Constrained Optimization

Projected-search methods for bound-constrained minimization are based on performing a line search along a continuous piecewise-linear path obtained by projecting a search direction onto the feasible region. A potential benefit of a projected-search method is that many changes to the active set can be made at the cost of computing a single search direction. As … Read more

Compact Representations of Structured BFGS Matrices

For general large-scale optimization problems compact representations exist in which recursive quasi-Newton update formulas are represented as compact matrix factorizations. For problems in which the objective function contains additional structure, so-called structured quasi-Newton methods exploit available second-derivative information and approximate unavailable second derivatives. This article develops the compact representations of two structured Broyden-Fletcher-Goldfarb-Shanno update formulas. … Read more

A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable to both the primal and the dual ERM problem. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or … Read more

Limited-Memory BFGS with Displacement Aggregation

A displacement aggregation strategy is proposed for the curvature pairs stored in a limited-memory BFGS (a.k.a. L-BFGS) method such that the resulting (inverse) Hessian approximations are equal to those that would be derived from a full-memory BFGS method. This means that, if a sufficiently large number of pairs are stored, then an optimization algorithm employing … Read more

Quasi-Newton Methods for Deep Learning: Forget the Past, Just Sample

We present two sampled quasi-Newton methods: sampled LBFGS and sampled LSR1. Contrary to the classical variants of these methods that sequentially build (inverse) Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate to produce these approximations. As a result, the approximations constructed make use of more reliable (recent … Read more