Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

Modern machine learning focuses on highly expressive models that are able to fit or interpolate the data completely, resulting in zero training loss. For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition. Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov … Read more

Gradient methods for convex minimization: better rates under weaker conditions

The convergence behavior of gradient methods for minimizing convex differentiable functions is one of the core questions in convex optimization. This paper shows that their well-known complexities can be achieved under conditions weaker than the commonly assumed ones. We relax the common gradient Lipschitz-continuity condition and strong convexity condition to ones that hold only over … Read more