Sharan Vaswani – Optimization Online

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

Published: 2019/02/25

Francis Bach
Mark Schmidt
Sharan Vaswani

Convex Optimization, Generalized Convexity/Monoticity interpolation, nesterov acceleration, over-parametrization, stochastic gradient descent

Modern machine learning focuses on highly expressive models that are able to fit or interpolate the data completely, resulting in zero training loss. For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition. Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov … Read more