Scalable Projection-Free Optimization Methods via MultiRadial Duality Theory

Recent works have developed new projection-free first-order methods based on utilizing linesearches and normal vector computations to maintain feasibility. These oracles can be cheaper than orthogonal projection or linear optimization subroutines but have the drawback of requiring a known strictly feasible point to do these linesearches with respect to. In this work, we develop new … Read more

On Averaging and Extrapolation for Gradient Descent

\(\) This work considers the effect of averaging, and more generally extrapolation, of the iterates of gradient descent in smooth convex optimization. After running the method, rather than reporting the final iterate, one can report either a convex combination of the iterates (averaging) or a generic combination of the iterates (extrapolation). For several common stepsize … Read more

Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization

We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a … Read more

Goldstein Stationarity in Lipschitz Constrained Optimization

We prove the first convergence guarantees for a subgradient method minimizing a generic Lipschitz function over generic Lipschitz inequality constraints. No smoothness or convexity (or weak convexity) assumptions are made. Instead, we utilize a sequence of recent advances in Lipschitz unconstrained minimization, which showed convergence rates of $O(1/\delta\epsilon^3)$ towards reaching a “Goldstein” stationary point, that … Read more

Accelerated Gradient Descent via Long Steps

Recently Grimmer [1] showed for smooth convex optimization by utilizing longer steps periodically, gradient descent’s state-of-the-art O(1/T) convergence guarantees can be improved by constant factors, conjecturing an accelerated rate strictly faster than O(1/T) could be possible. Here we prove such a big-O gain, establishing gradient descent’s first accelerated convergence rate in this setting. Namely, we … Read more

Provably Faster Gradient Descent via Long Steps

This work establishes provably faster convergence rates for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We … Read more

First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

Constrained optimization problems where both the objective and constraints may be nonsmooth and nonconvex arise across many learning and data science settings. In this paper, we show a simple first-order method finds a feasible, ϵ-stationary point at a convergence rate of O(ϵ−4) without relying on compactness or Constraint Qualification (CQ). When CQ holds, this convergence is measured by … Read more

On Optimal Universal First-Order Methods for Minimizing Heterogeneous Sums

This work considers minimizing a convex sum of functions, each with potentially different structure ranging from nonsmooth to smooth, Lipschitz to non-Lipschitz. Nesterov’s universal fast gradient method provides an optimal black-box first-order method for minimizing a single function that takes advantage of any continuity structure present without requiring prior knowledge. In this paper, we show … Read more

Optimal Convergence Rates for the Proximal Bundle Method

We study convergence rates of the classic proximal bundle method for a variety of nonsmooth convex optimization problems. We show that, without any modification, this algorithm adapts to converge faster in the presence of smoothness or a Hölder growth condition. Our analysis reveals that with a constant stepsize, the bundle method is adaptive, yet it … Read more

Radial Duality Part I: Foundations

(Renegar, 2016) introduced a novel approach to transforming generic conic optimization problems into unconstrained, uniformly Lipschitz continuous minimization. We introduce radial transformations generalizing these ideas, equipped with an entirely new motivation and development that avoids any reliance on convex cones or functions. Perhaps of greatest practical importance, this facilitates the development of new families of … Read more