Optimization for Supervised Machine Learning: Randomized Algorithms for Data and Parameters

Many key problems in machine learning and data science are routinely modeled as optimization problems and solved via optimization algorithms. With the increase of the volume of data and the size and complexity of the statistical models used to formulate these often ill-conditioned optimization tasks, there is a need for new efficient algorithms able to … Read more

Coordinate Descent Without Coordinates: Tangent Subspace Descent on Riemannian Manifolds

We extend coordinate descent to manifold domains, and provide convergence analyses for geodesically convex and non-convex smooth objective functions. Our key insight is to draw an analogy between coordinate blocks in Euclidean space and tangent subspaces of a manifold. Hence, our method is called tangent subspace descent (TSD). The core principle behind ensuring convergence of … Read more

Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity

It was recently found that the standard version of multi-block cyclic ADMM diverges. Interestingly, Gaussian Back Substitution ADMM (GBS-ADMM) and symmetric Gauss-Seidel ADMM (sGS-ADMM) do not have the divergence issue. Therefore, it seems that symmetrization can improve the performance of the classical cyclic order. In another recent work, cyclic CD (Coordinate Descent) was shown to … Read more

Analyzing Random Permutations for Cyclic Coordinate Descent

We consider coordinate descent methods on convex quadratic problems, in which exact line searches are performed at each iteration. (This algorithm is identical to Gauss-Seidel on the equivalent symmetric positive definite linear system.) We describe a class of convex quadratic problems for which the random-permutations version of cyclic coordinate descent (RPCD) outperforms the standard cyclic … Read more

Rescaling Algorithms for Linear Programming Part I: Conic feasibility

We propose simple polynomial-time algorithms for two linear conic feasibility problems. For a matrix $A\in \R^{m\times n}$, the {\em kernel problem} requires a positive vector in the kernel of $A$, and the {\em image problem} requires a positive vector in the image of $A^\T$. Both algorithms iterate between simple first order steps and rescaling steps. … Read more

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Lojasiewicz Condition

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-Lojasiewicz (PL) … Read more

Random permutations fix a worst case for cyclic coordinate descent

Variants of the coordinate descent approach for minimizing a nonlinear function are distinguished in part by the order in which coordinates are considered for relaxation. Three common orderings are cyclic (CCD), in which we cycle through the components of $x$ in order; randomized (RCD), in which the component to update is selected randomly and independently … Read more

The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems

We introduce the Asynchronous PALM algorithm, a new extension of the Proximal Alternating Linearized Minimization (PALM) algorithm for solving nonconvex nonsmooth optimization problems. Like the PALM algorithm, each step of the Asynchronous PALM algorithm updates a single block of coordinates; but unlike the PALM algorithm, the Asynchronous PALM algorithm eliminates the need for sequential updates … Read more

On the von Neumann and Frank-Wolfe Algorithms with Away Steps

The von Neumann algorithm is a simple coordinate-descent algorithm to determine whether the origin belongs to a polytope generated by a finite set of points. When the origin is in the interior of the polytope, the algorithm generates a sequence of points in the polytope that converges linearly to zero. The algorithm’s rate of convergence … Read more

Coordinate descent algorithms

Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate … Read more