Julie Nutini – Optimization Online

Are we there yet? Manifold identification of gradient-related proximal methods

Published: 2019/03/07

In machine learning, models that generalize better often generate outputs that lie on a low-dimensional manifold. Recently, several works have separately shown finite-time manifold identification by some proximal methods. In this work we provide a unified view by giving a simple condition under which any proximal method using a constant step size can achieve finite-iteration … Read more

Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

Published: 2017/12/23

Issam Laradji

Mark Schmidt

Julie Nutini

Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper … Read more

”Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?

Published: 2017/12/10, Updated: 2018/10/14

Warren Hare

Mark Schmidt

Julie Nutini

Convex and Nonsmooth Optimization, Convex Optimization, Nonsmooth Optimization active-set complexity, active-set identification, proximal gradient methods

Proximal gradient methods have been found to be highly effective for solving minimization problems with non-negative constraints or L1-regularization. Under suitable nondegeneracy conditions, it is known that these algorithms identify the optimal sparsity pattern for these types of problems in a finite number of iterations. However, it is not known how many iterations this may … Read more

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Lojasiewicz Condition

Published: 2016/08/16, Updated: 2020/09/12

Hamed Karimi

Mark Schmidt

Julie Nutini

Convex and Nonsmooth Optimization boosting, coordinate descent, gradient descent, l1-regularization, least squares, logistic regression, stochastic gradient, support vector machines, variance reduction

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-Lojasiewicz (PL) … Read more