Two-level stochastic optimization formulations have become instrumental in a number of
machine learning contexts such as continual learning, neural architecture search, adversarial
learning, and hyperparameter tuning. Practical stochastic bilevel optimization problems
become challenging in optimization or learning scenarios where the number of variables is
high or there are constraints.
In this paper, we introduce a bilevel stochastic gradient method for bilevel problems with
lower level constraints. We also present a comprehensive convergence theory that covers all
inexact calculations of the adjoint gradient (also called hypergradient) and addresses both the
lower-level unconstrained and constrained cases. To promote the use of bilevel optimization
in large-scale learning, we introduce a practical bilevel stochastic gradient method (BSG-1)
that does not require second-order derivatives and, in the lower-level unconstrained case,
dismisses any system solves and matrix-vector products.
Citation
T. Giovannelli, G. Kent, and L. N. Vicente, Inexact bilevel stochastic gradient methods for constrained and unconstrained lower-level problems, ISE Technical Report 21T-025, Lehigh University.