Stochastic Aspects of Dynamical Low-Rank Approximation in the Context of Machine Learning

The central challenges of today’s neural network architectures are the prohibitive
memory footprint and the training costs associated with determining optimal weights and biases. A
large portion of research in machine learning is therefore dedicated to constructing memory-efficient
training methods. One promising approach is dynamical low-rank training (DLRT) which represents
and trains parameters as a low-rank factorization. While DLRT is equipped with several beneficial
properties, analytic results are currently limited to deterministic gradient flows. In this work, we show
that dynamical low-rank training in combination with stochastic gradient and momentum methods
fulfills descent guarantees and prove its convergence to an optimal point.