Stochastic Gradient Methods with Online Scaling

This paper introduces Stochastic Online Scaled Gradient Methods (SOSGM), a generalization of the recently developed adaptive preconditioning framework in \cite{gao2025gradient,chu2025gradient} to stochastic optimization. Under standard assumptions, we establish convergence guarantees for SOSGM using large batchsize or variance reduction. SOSGM is compatible with popular diagonal and/or low-rank preconditioners as well as heavy-ball momentum, while maintaining memory and computation cost comparable to Adam. Extensive numerical experiments demonstrate the strong empirical performance of SOSGM. Using a diagonal preconditioner, SOSGM and its variants substantially outperform existing adaptive first-order methods across a range of statistical learning tasks.

Article

Download

View PDF