Block cubic Newton with greedy selection

A second-order block coordinate descent method is proposed for the unconstrained minimization of an objective function with Lipschitz continuous Hessian. At each iteration, a block of variables is selected by means of a greedy (Gauss-Southwell) rule which considers the amount of first-order stationarity violation, then an approximate minimizer of a cubic model is computed for … Read more

Projection free methods on product domains

Projection-free block-coordinate methods avoid high computational cost per iteration and at the same time exploit the particular problem structure of product domains. Frank-Wolfe-like approaches rank among the most popular ones of this type. However, as observed in the literature, there was a gap between the classical Frank-Wolfe theory and the block-coordinate case. Moreover, most of … Read more

Global Convergence in Deep Learning with Variable Splitting via the Kurdyka-{\L}ojasiewicz Property

Deep learning has recently attracted a significant amount of attention due to its great empirical success. However, the effectiveness in training deep neural networks (DNNs) remains a mystery in the associated nonconvex optimizations. In this paper, we aim to provide some theoretical understanding on such optimization problems. In particular, the Kurdyka-{\L}ojasiewicz (KL) property is established … Read more

Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper … Read more

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $\phi(x_0,\ldots,x_p,y)$, subject to coupled linear equality constraints. Our ADMM updates each of the primal variables $x_0,\ldots,x_p,y$, followed by updating the dual variable. We separate the variable $y$ from $x_i$’s as it … Read more

A New First-order Algorithmic Framework for Optimization Problems with Orthogonality Constraints

In this paper, we consider a class of optimization problems with orthogonality constraints, the feasible region of which is called the Stiefel manifold. Our new framework combines a function value reduction step with a correction step. Different from the existing approaches, the function value reduction step of our algorithmic framework searches along the standard Euclidean … Read more

The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM

We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best known rates of convergence — among … Read more

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $\phi(x_1,\ldots,x_p,y)$, subject to linear equality constraints that couple $x_1,\ldots,x_p,y$, where $p\ge 1$ is an integer. Our ADMM sequentially updates the primal variables in the order $x_1,\ldots,x_p,y$, followed by updating the dual … Read more

A remark on accelerated block coordinate descent for computing the proximity operators of a sum of convex functions

We analyze alternating descent algorithms for minimizing the sum of a quadratic function and block separable non-smooth functions. In case the quadratic interactions between the blocks are pairwise, we show that the schemes can be accelerated, leading to improved convergence rates with respect to related accelerated parallel proximal descent. As an application we obtain very … Read more

Randomized First-order Methods for Saddle Point Optimization

In this paper, we present novel randomized algorithms for solving saddle point problems whose dual feasible region is a direct product of many convex sets. Our algorithms can achieve ${\cal O}(1/N)$ rate of convergence by solving only one dual subproblem at each iteration. Our algorithms can also achieve ${\cal O}(1/N^2)$ rate of convergence if a … Read more