The central challenges of today’s neural network architectures are the prohibitive
memory footprint and the training costs associated with determining optimal weights and biases. A
large portion of research in machine learning is therefore dedicated to constructing memory-efficient
training methods. One promising approach is dynamical low-rank training (DLRT) which represents
and trains parameters as a low-rank factorization. While DLRT is equipped with several beneficial
properties, analytic results are currently limited to deterministic gradient flows. In this work, we show
that dynamical low-rank training in combination with stochastic gradient and momentum methods
fulfills descent guarantees and prove its convergence to an optimal point.