Analyzing Random Permutations for Cyclic Coordinate Descent

We consider coordinate descent methods on convex quadratic problems, in which exact line searches are performed at each iteration. (This algorithm is identical to Gauss-Seidel on the equivalent symmetric positive definite linear system.) We describe a class of convex quadratic problems for which the random-permutations version of cyclic coordinate descent (RPCD) outperforms the standard cyclic … Read more

Optimization Algorithms for Data Analysis

We describe the fundamentals of algorithms for minimizing a smooth nonlinear function, and extensions of these methods to the sum of a smooth function and a convex nonsmooth function. Such objective functions are ubiquitous in data analysis applications, as we illustrate using several examples. We discuss methods that make use of gradient (first-order) information about … Read more

Random permutations fix a worst case for cyclic coordinate descent

Variants of the coordinate descent approach for minimizing a nonlinear function are distinguished in part by the order in which coordinates are considered for relaxation. Three common orderings are cyclic (CCD), in which we cycle through the components of $x$ in order; randomized (RCD), in which the component to update is selected randomly and independently … Read more

Coordinate descent algorithms

Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate … Read more

Beyond the Birkhoff Polytope: Convex Relaxations for Vector Permutation Problems

The Birkhoff polytope (the convex hull of the set of permutation matrices) is frequently invoked in formulating relaxations of optimization problems over permutations. The Birkhoff polytope is represented using Θ(n^2) variables and constraints, significantly more than the n variables one could use to represent a permutation as a vector. Using a recent construction of Goemans … Read more

Local Convergence of an Algorithm for Subspace Identification from Partial Data

GROUSE (Grassmannian Rank-One Update Subspace Estimation) is an iterative algorithm for identifying a linear subspace of $\R^n$ from data consisting of partial observations of random vectors from that subspace. This paper examines local convergence properties of GROUSE, under assumptions on the randomness of the observed vectors, the randomness of the subset of elements observed at … Read more

An Sl1LP-Active Set Approach for Feasibility Restoration in Power Systems

We consider power networks in which it is not possible to satisfy all loads at the demand nodes, due to some attack or disturbance to the network. We formulate a model, based on AC power flow equations, to restore the network to feasibility by shedding load at demand nodes, but doing so in a way … Read more

Validating Sample Average Approximation Solutions with Negatively Dependent Batches

Sample-average approximations (SAA) are a practical means of finding approximate solutions of stochastic programming problems involving an extremely large (or infinite) number of scenarios. SAA can also be used to find estimates of a lower bound on the optimal objective value of the true problem which, when coupled with an upper bound, provides confidence intervals … Read more

Forward – Backward Greedy Algorithms for Atomic – Norm Regularization

In many signal processing applications, one aims to reconstruct a signal that has a simple representation with respect to a certain basis or frame. Fundamental elements of the basis known as “atoms” allow us to define “atomic norms” that can be used to construct convex regularizers for the reconstruction problem. Efficient algorithms are available to … Read more

Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

We describe an asynchronous parallel stochastic proximal coordinate descent algorithm for minimizing a composite objective function, which consists of a smooth convex function plus a separable convex function. In contrast to previous analyses, our model of asynchronous computation accounts for the fact that components of the unknown vector may be written by some cores simultaneously … Read more