Stochastic Aspects of Dynamical Low-Rank Approximation in the Context of Machine Learning

Published: 2024/03/23, Updated: 2024/05/15

Data Science Theory, Nonlinear Optimization, Optimization in Data Science deep neural networks, Dynamical Low-Rank Approximation (DLRA), Dynamical Low-Rank Training13 (DLRT), machine learning, stochastic gradient descent Short URL: https://optimization-online.org/?p=25971

The central challenges of today’s neural network architectures are the prohibitive
memory footprint and the training costs associated with determining optimal weights and biases. A
large portion of research in machine learning is therefore dedicated to constructing memory-efficient
training methods. One promising approach is dynamical low-rank training (DLRT) which represents
and trains parameters as a low-rank factorization. While DLRT is equipped with several beneficial
properties, analytic results are currently limited to deterministic gradient flows. In this work, we show
that dynamical low-rank training in combination with stochastic gradient and momentum methods
fulfills descent guarantees and prove its convergence to an optimal point.

Article

Download

View PDF