A nearly linearly convergent first-order method for nonsmooth functions with quadratic growth

Classical results show that gradient descent converges linearly to minimizers of smooth strongly convex functions. A natural question is whether there exists a locally nearly linearly convergent method for nonsmooth functions with quadratic growth. This work designs such a method for a wide class of nonsmooth and nonconvex locally Lipschitz functions, including max-of-smooth, Shapiro’s decomposable … Read more

A superlinearly convergent subgradient method for sharp semismooth problems

Subgradient methods comprise a fundamental class of nonsmooth optimization algorithms. Classical results show that certain subgradient methods converge sublinearly for general Lipschitz convex functions and converge linearly for convex functions that grow sharply away from solutions. Recent work has moreover extended these results to certain nonconvex problems. In this work we seek to improve the … Read more

Subgradient methods near active manifolds: saddle point avoidance, local convergence, and asymptotic normality

Nonsmooth optimization problems arising in practice, whether in signal processing, statistical estimation, or modern machine learning, tend to exhibit beneficial smooth substructure: their domains stratify into “active manifolds” of smooth variation, which common proximal algorithms “identify” in finite time. Identification then entails a transition to smooth dynamics, and permits the use of second-order information for … Read more

Escaping strict saddle points of the Moreau envelope in nonsmooth optimization

Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization … Read more

Active strict saddles in nonsmooth optimization

We introduce a geometrically transparent strict saddle property for nonsmooth functions. This property guarantees that simple proximal algorithms on weakly convex problems converge only to local minimizers, when randomly initialized. We argue that the strict saddle property may be a realistic assumption in applications, since it provably holds for generic semi-algebraic optimization problems. ArticleDownload View … Read more

Robust stochastic optimization with the proximal point method

Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation. In this work, we show that a wide class of such algorithms on strongly convex problems can be augmented with sub-exponential confidence bounds at an overhead cost that is only … Read more

Stochastic algorithms with geometric step decay converge linearly on sharp functions

Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called geometric step decay and proceeds by halving the step size after every few epochs. In … Read more

Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence

The task of recovering a low-rank matrix from its noisy linear measurements plays a central role in computational science. Smooth formulations of the problem often exhibit an undesirable phenomenon: the condition number, classically defined, scales poorly with the dimension of the ambient space. In contrast, we here show that in a variety of concrete circumstances, … Read more

Composite optimization for robust blind deconvolution

The blind deconvolution problem seeks to recover a pair of vectors from a set of rank one bilinear measurements. We consider a natural nonsmooth formulation of the problem and show that under standard statistical assumptions, its moduli of weak convexity, sharpness, and Lipschitz continuity are all dimension independent. This phenomenon persists even when up to … Read more

Stochastic model-based minimization under high-order growth

Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively sample and minimize stochastic convex models of the objective function. Assuming that the one-sided approximation quality and the variation of the models is controlled by a Bregman divergence, we show that the scheme drives a natural stationarity measure to zero at the rate $O(k^{-1/4})$. … Read more