Iteration Complexity of Fixed-Step Methods by Nesterov and Polyak for Convex Quadratic Functions

This note considers the momentum method by Polyak and the accelerated gradient method by Nesterov, both without line search but with fixed step length applied to strictly convex quadratic functions assuming that exact gradients are used and appropriate upper and lower bounds for the extreme eigenvalues of the Hessian matrix are known. Simple 2-d-examples show … Read more

A Riemannian ADMM

We consider a class of Riemannian optimization problems where the objective is the sum of a smooth function and a nonsmooth function, considered in the ambient space. This class of problems finds important applications in machine learning and statistics such as the sparse principal component analysis, sparse spectral clustering, and orthogonal dictionary learning. We propose … Read more

Inexact Proximal-Gradient Methods with Support Identification

We consider the proximal-gradient method for minimizing an objective function that is the sum of a smooth function and a non-smooth convex function. A feature that distinguishes our work from most in the literature is that we assume that the associated proximal operator does not admit a closed-form solution. To address this challenge, we study … Read more

Robust Two-Stage Optimization with Covariate Data

We consider a generalization of two-stage decision problems in which the second-stage decision may be a function of a predictive signal but cannot adapt fully to the realized uncertainty. We will show how such problems can be learned from sample data by considering a family of regularized sample average formulations. Furthermore, our regularized data-driven formulations … Read more

RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates

Proximal splitting algorithms are well suited to solving large-scale nonsmooth optimization problems, in particular those arising in machine learning. We propose a new primal-dual algorithm, in which the dual update is randomized; equivalently, the proximity operator of one of the function in the problem is replaced by a stochastic oracle. For instance, some randomly chosen … Read more

Inertial Krasnoselskii-Mann Iterations

We establish the weak convergence of inertial Krasnoselskii-Mann iterations towards a common fixed point of a family of quasi-nonexpansive operators, along with worst case estimates for the rate at which the residuals vanish. Strong and linear convergence are obtained in the quasi-contractive setting. In both cases, we highlight the relationship with the non-inertial case, and … Read more

The cost of nonconvexity in deterministic nonsmooth optimization

We study the impact of nonconvexity on the complexity of nonsmooth optimization, emphasizing objectives such as piecewise linear functions, which may not be weakly convex. We focus on a dimension-independent analysis, slightly modifying a black-box algorithm of Zhang et al. that approximates an $\epsilon$-stationary point of any directionally differentiable Lipschitz objective using $O(\epsilon^{-4})$ calls to … Read more

The projective exact penalty method for general constrained optimization

A new projective exact penalty function method is proposed for the equivalent reduction of constrained optimization problems to nonsmooth unconstrained ones. In the method, the original objective function is extended to infeasible points by summing its value at the projection of an infeasible point on the feasible set with the distance to the projection. The … Read more

Fixed-Point Automatic Differentiation of Forward–Backward Splitting Algorithms for Partly Smooth Functions

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We consider such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity … Read more

Optimized convergence of stochastic gradient descent by weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the … Read more