Stochastic Aspects of Dynamical Low-Rank Approximation in the Context of Machine Learning

The central challenges of today’s neural network architectures are the prohibitive memory footprint and training costs associated with determining optimal weights and biases. A large portion of research in machine learning is therefore dedicated to constructing memory-efficient training methods. One promising approach is dynamical low-rank training (DLRT), which represents and trains parameters as a low-rank factorization. While DLRT is equipped with several beneficial properties, analytic results are currently limited to deterministic gradient flows. In this work, we show that dynamical low-rank training in combination with stochastic gradient and momentum methods fulfills certain robustness and descent guarantees. Moreover, we prove convergence to stationary points given the assumption that basis updates are omitted after a certain number of iterations. This agrees with the main numerical behavior observed in applications of this method.

Article

Download

View PDF