Non-asymptotic superlinear convergence of Nesterov accelerated BFGS

This paper studies the convergence of a Nesterov accelerated variant of the Broyden-Fletcher-Goldfarb-Shanno (NA-BFGS) quasi-Newton method in the setting where the objective function is strongly convex, its gradient is Lipschitz continuous, and its Hessian is Lipschitz continuous at the optimal point. We demonstrate that similar to the classic BFGS method, the Nesterov accelerated BFGS method … Read more

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

Modern machine learning focuses on highly expressive models that are able to fit or interpolate the data completely, resulting in zero training loss. For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition. Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov … Read more

Gradient methods for convex minimization: better rates under weaker conditions

The convergence behavior of gradient methods for minimizing convex differentiable functions is one of the core questions in convex optimization. This paper shows that their well-known complexities can be achieved under conditions weaker than the commonly assumed ones. We relax the common gradient Lipschitz-continuity condition and strong convexity condition to ones that hold only over … Read more