Bilevel stochastic methods for optimization and machine learning: Bilevel stochastic descent and DARTS

Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as neural architecture search, continual learning, adversarial learning, and hyperparameter tuning. Practical stochastic bilevel optimization problems become challenging in optimization or learning scenarios where the number of variables is high or there are constraints. The goal of this paper is twofold. First, we aim at promoting the use of bilevel optimization in large-scale learning and we introduce a practical bilevel stochastic gradient method (BSG-1) that requires neither lower level second-order derivatives nor system solves (and dismisses any matrix-vector products). Our BSG-1 method is close to first-order principles, which allows it to achieve a performance better than those that are not, such as DARTS. Second, we develop bilevel stochastic gradient descent for bilevel problems with lower level constraints, and we introduce a convergence theory that covers the unconstrained and constrained cases and abstracts as much as possible from the specifics of the bilevel gradient calculation.

Citation

T. Giovannelli, G. Kent, and L. N. Vicente, Bilevel stochastic methods for optimization and machine learning: Bilevel stochastic descent and DARTS, ISE Technical Report 21T-025, Lehigh University.

Article

Download

View Bilevel stochastic methods for optimization and machine learning: Bilevel stochastic descent and DARTS