Global Convergence in Deep Learning with Variable Splitting via the Kurdyka-{\L}ojasiewicz Property

Deep learning has recently attracted a significant amount of attention due to its great empirical success. However, the effectiveness in training deep neural networks (DNNs) remains a mystery in the associated nonconvex optimizations. In this paper, we aim to provide some theoretical understanding on such optimization problems. In particular, the Kurdyka-{\L}ojasiewicz (KL) property is established … Read more

Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper … Read more

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $\phi(x_0,\ldots,x_p,y)$, subject to coupled linear equality constraints. Our ADMM updates each of the primal variables $x_0,\ldots,x_p,y$, followed by updating the dual variable. We separate the variable $y$ from $x_i$’s as it … Read more

A New First-order Algorithmic Framework for Optimization Problems with Orthogonality Constraints

In this paper, we consider a class of optimization problems with orthogonality constraints, the feasible region of which is called the Stiefel manifold. Our new framework combines a function value reduction step with a correction step. Different from the existing approaches, the function value reduction step of our algorithmic framework searches along the standard Euclidean … Read more

The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM

We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best known rates of convergence — among … Read more

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $\phi(x_1,\ldots,x_p,y)$, subject to linear equality constraints that couple $x_1,\ldots,x_p,y$, where $p\ge 1$ is an integer. Our ADMM sequentially updates the primal variables in the order $x_1,\ldots,x_p,y$, followed by updating the dual … Read more

A remark on accelerated block coordinate descent for computing the proximity operators of a sum of convex functions

We analyze alternating descent algorithms for minimizing the sum of a quadratic function and block separable non-smooth functions. In case the quadratic interactions between the blocks are pairwise, we show that the schemes can be accelerated, leading to improved convergence rates with respect to related accelerated parallel proximal descent. As an application we obtain very … Read more

Randomized First-order Methods for Saddle Point Optimization

In this paper, we present novel randomized algorithms for solving saddle point problems whose dual feasible region is a direct product of many convex sets. Our algorithms can achieve ${\cal O}(1/N)$ rate of convergence by solving only one dual subproblem at each iteration. Our algorithms can also achieve ${\cal O}(1/N^2)$ rate of convergence if a … Read more

Block stochastic gradient iteration for convex and nonconvex optimization

The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions, or solve a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks … Read more

Robust Block Coordinate Descent

In this paper we present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Robust Coordinate Descent (RCD). At … Read more