A minibatch stochastic Quasi-Newton method adapted for nonconvex deep learning problems

In this study, we develop a limited memory nonconvex Quasi-Newton (QN) method, tailored to deep learning (DL) applications. Since the stochastic nature of (sampled) function information in minibatch processing can affect the performance of QN methods, three strategies are utilized to overcome this issue. These involve a novel progressive trust-region radius update (suitable for stochastic … Read more

Quasi-Newton Methods for Deep Learning: Forget the Past, Just Sample

We present two sampled quasi-Newton methods: sampled LBFGS and sampled LSR1. Contrary to the classical variants of these methods that sequentially build (inverse) Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate to produce these approximations. As a result, the approximations constructed make use of more reliable (recent … Read more

Underestimate Sequences via Quadratic Averaging

In this work we introduce the concept of an Underestimate Sequence (UES), which is a natural extension of Nesterov’s estimate sequence. Our definition of a UES utilizes three sequences, one of which is a lower bound (or under-estimator) of the objective function. The question of how to construct an appropriate sequence of lower bounds is … Read more