Stochastic model-based minimization of weakly convex functions

We consider an algorithm that successively samples and minimizes stochastic models of the objective function. We show that under weak-convexity and Lipschitz conditions, the algorithm drives the expected norm of the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$. Our result yields the first complexity guarantees for the stochastic proximal point algorithm … Read more

Stochastic subgradient method converges at the rate (k^{-1/4})$ on weakly convex function

We prove that the projected stochastic subgradient method, applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$. Article Download View Stochastic subgradient method converges at the rate (k^{-1/4})$ on weakly convex function

The nonsmooth landscape of phase retrieval

We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under standard statistical assumptions, a simple subgradient method converges linearly when initialized within a constant relative distance of an optimal solution. Seeking to understand the distribution of the stationary points of the problem, we complete the paper by proving that … Read more

The Many Faces of Degeneracy in Conic Optimization

Slater’s condition — existence of a “strictly feasible solution” — is a common assumption in conic optimization. Without strict feasibility, first-order optimality conditions may be meaningless, the dual problem may yield little information about the primal, and small changes in the data may render the problem infeasible. Hence, failure of strict feasibility can negatively impact … Read more

Foundations of gauge and perspective duality

Common numerical methods for constrained convex optimization are predicated on efficiently computing nearest points to the feasible region. The presence of a design matrix in the constraints yields feasible regions with more complex geometries. When the functional components are gauges, there is an equivalent optimization problem—the gauge dual– where the matrix appears only in the … Read more

Efficiency of minimizing compositions of convex functions and smooth maps

We consider the problem of minimizing a sum of a convex function and a composition of a convex function with a smooth map. Important examples include exact penalty formulations of nonlinear programs and nonlinear least squares problems with side constraints. The basic algorithm we rely on is the well-known prox-linear method, which in each iteration … Read more

Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria

We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss-Newton type for minimizing the composition of a convex function and a smooth map are common examples. Our main result is an explicit relationship between the step-size of any such algorithm and the slope of the function at a … Read more

An optimal first order method based on optimal quadratic averaging

In a recent paper, Bubeck, Lee, and Singh introduced a new first order method for minimizing smooth strongly convex functions. Their geometric descent algorithm, largely inspired by the ellipsoid method, enjoys the optimal linear rate of convergence. Motivated by their work, we propose a close variant that iteratively maintains a quadratic global under-estimator of the … Read more

The Euclidean distance degree of orthogonally invariant matrix varieties

The Euclidean distance degree of a real variety is an important invariant arising in distance minimization problems. We show that the Euclidean distance degree of an orthogonally invariant matrix variety equals the Euclidean distance degree of its restriction to diagonal matrices. We illustrate how this result can greatly simplify calculations in concrete circumstances. Article Download … Read more

Error bounds, quadratic growth, and linear convergence of proximal methods

We show that the the error bound property, postulating that the step lengths of the proximal gradient method linearly bound the distance to the solution set, is equivalent to a standard quadratic growth condition. We exploit this equivalence in an analysis of asymptotic linear convergence of the proximal gradient algorithm for structured problems, which lack … Read more