Acceleration by Random Stepsizes: Hedging, Equalization, and the Arcsine Stepsize Schedule

We show that for separable convex optimization, random stepsizes fully accelerate Gradient Descent. Specifically, using inverse stepsizes i.i.d. from the Arcsine distribution improves the convergence rate from $O(k)$ to $O(\sqrt(k))$, where $k$ is the condition number. No momentum or other algorithmic modifications are required. Our starting point is a remarkable “equalization property” of the Arcsine distribution: it yields an identical convergence rate for all quadratic functions. A key technical insight is that martingale arguments extend this phenomenon to all separable convex functions. We interpret this equalization as an extreme form of hedging: by using this random distribution over stepsizes, Gradient Descent converges at exactly the same rate for all functions in the function class.

Article

Download

View PDF