Stochastic subgradient method converges on tame functions

This work considers the question: what convergence guarantees does the stochastic subgradient method have in the absence of smoothness and convexity? We prove that the stochastic subgradient method, on any semialgebraic locally Lipschitz function, produces limit points that are all first-order stationary. More generally, our result applies to any function with a Whitney stratifiable graph. … Read more

Stochastic model-based minimization of weakly convex functions

We consider an algorithm that successively samples and minimizes stochastic models of the objective function. We show that under weak-convexity and Lipschitz conditions, the algorithm drives the expected norm of the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$. Our result yields the first complexity guarantees for the stochastic proximal point algorithm … Read more

Stochastic subgradient method converges at the rate (k^{-1/4})$ on weakly convex function

We prove that the projected stochastic subgradient method, applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$. Article Download View Stochastic subgradient method converges at the rate (k^{-1/4})$ on weakly convex function

The nonsmooth landscape of phase retrieval

We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under standard statistical assumptions, a simple subgradient method converges linearly when initialized within a constant relative distance of an optimal solution. Seeking to understand the distribution of the stationary points of the problem, we complete the paper by proving that … Read more

A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

Machine learning theory typically assumes that training data is unbiased and not adversarially generated. When real training data deviates from these assumptions, trained models make erroneous predictions, sometimes with disastrous effects. Robust losses, such as the huber norm are designed to mitigate the effects of such contaminated data, but they are limited to the regression … Read more

The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM

We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best known rates of convergence — among … Read more

The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems

We introduce the Asynchronous PALM algorithm, a new extension of the Proximal Alternating Linearized Minimization (PALM) algorithm for solving nonconvex nonsmooth optimization problems. Like the PALM algorithm, each step of the Asynchronous PALM algorithm updates a single block of coordinates; but unlike the PALM algorithm, the Asynchronous PALM algorithm eliminates the need for sequential updates … Read more

SMART: The Stochastic Monotone Aggregated Root-Finding Algorithm

We introduce the Stochastic Monotone Aggregated Root-Finding (SMART) algorithm, a new randomized operator-splitting scheme for finding roots of finite sums of operators. These algorithms are similar to the growing class of incremental aggregated gradient algorithms, which minimize finite sums of functions; the difference is that we replace gradients of functions with black-boxes called operators, which … Read more

An (n\log(n))$ Algorithm for Projecting Onto the Ordered Weighted $\ell_1$ Norm Ball

The ordered weighted $\ell_1$ (OWL) norm is a newly developed generalization of the Octogonal Shrinkage and Clustering Algorithm for Regression (OSCAR) norm. This norm has desirable statistical properties and can be used to perform simultaneous clustering and regression. In this paper, we show how to compute the projection of an $n$-dimensional vector onto the OWL … Read more

A Three-Operator Splitting Scheme and its Optimization Applications

Operator splitting schemes have been successfully used in computational sciences to reduce complex problems into a series of simpler subproblems. Since 1950s, these schemes have been widely used to solve problems in PDE and control. Recently, large-scale optimization problems in machine learning, signal processing, and imaging have created a resurgence of interest in operator-splitting based … Read more