Preconditioned Gradient Descent for Overparameterized Nonconvex Burer–Monteiro Factorization with Global Optimality Certification

We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally \emph{rank deficient}, then its … Read more

Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization

For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even when the objective is smooth at every iterate, the corresponding local models are unstable and the number of cutting planes invoked by traditional remedies is … Read more

We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don’t increase the stepsize too fast and 2) don’t overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive … Read more

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose … Read more

Complexity of gradient descent for multiobjective optimization

A number of first-order methods have been proposed for smooth multiobjective optimization for which some form of convergence to first order criticality has been proved. Such convergence is global in the sense of being independent of the starting point. In this paper we analyze the rate of convergence of gradient descent for smooth unconstrained multiobjective … Read more

New analysis of linear convergence of gradient-type methods via unifying error bound conditions

The subject of linear convergence of gradient-type methods on non-strongly convex optimization has been widely studied by introducing several notions as sufficient conditions. Influential examples include the error bound property, the restricted strongly convex property, the quadratic growth property, and the Kurdyka-{\L}ojasiewicz property. In this paper, we first define a group of error bound conditions … Read more

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-Lojasiewicz (PL) … Read more

Gradient Descent only Converges to Minimizers

We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory. Article Download View Gradient Descent only Converges to Minimizers

Global convergence of the Heavy-ball method for convex optimization

This paper establishes global convergence and provides global bounds of the convergence rate of the Heavy-ball method for convex optimization problems. When the objective function has Lipschitz-continuous gradient, we show that the Cesa ́ro average of the iterates converges to the optimum at a rate of $O(1/k)$ where k is the number of iterations. When … Read more

On the Convergence of Decentralized Gradient Descent

Consider the consensus problem of minimizing $f(x)=\sum_{i=1}^n f_i(x)$ where each $f_i$ is only known to one individual agent $i$ out of a connected network of $n$ agents. All the agents shall collaboratively solve this problem and obtain the solution subject to data exchanges restricted to between neighboring agents. Such algorithms avoid the need of a … Read more