Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis

Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show … Read more

A termination criterion for stochastic gradient descent for binary classification

We propose a new, simple, and computationally inexpensive termination test for constant step-size stochastic gradient descent (SGD) applied to binary classification on the logistic and hinge loss with homogeneous linear predictors. Our theoretical results support the effectiveness of our stopping criterion when the data is Gaussian distributed. This presence of noise allows for the possibility … Read more

Potential-based analyses of first-order methods for constrained and composite optimization

We propose potential-based analyses for first-order algorithms applied to constrained and composite minimization problems. We first propose “idealized” frameworks for algorithms in the strongly and non-strongly convex cases and argue based on a potential that methods following the framework achieve the best possible rate. Then we show that the geometric descent (GD) algorithm by Bubeck … Read more

The nonsmooth landscape of phase retrieval

We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under standard statistical assumptions, a simple subgradient method converges linearly when initialized within a constant relative distance of an optimal solution. Seeking to understand the distribution of the stationary points of the problem, we complete the paper by proving that … Read more

Efficiency of minimizing compositions of convex functions and smooth maps

We consider the problem of minimizing a sum of a convex function and a composition of a convex function with a smooth map. Important examples include exact penalty formulations of nonlinear programs and nonlinear least squares problems with side constraints. The basic algorithm we rely on is the well-known prox-linear method, which in each iteration … Read more

Variational analysis of spectral functions simplified

Spectral functions of symmetric matrices — those depending on matrices only through their eigenvalues — appear often in optimization. A cornerstone variational analytic tool for studying such functions is a formula relating their subdifferentials to the subdifferentials of their diagonal restrictions. This paper presents a new, short, and revealing derivation of this result. We then … Read more