The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates

We propose two numerical algorithms for minimizing the sum of a smooth function and the composition of a nonsmooth function with a linear operator in the fully nonconvex setting. The iterative schemes are formulated in the spirit of the proximal and, respectively, proximal linearized alternating direction method of multipliers. The proximal terms are introduced through … Read more

Convergence rates of Forward-Douglas-Rachford splitting method

Over the past years, operator splitting methods have become ubiquitous for non-smooth optimization owing to their simplicity and efficiency. In this paper, we consider the Forward–Douglas–Rachford splitting method (FDR) [10, 40], and study both global and local convergence rates of this method. For the global rate, we establish an o(1/k) convergence rate in terms of … Read more

A Random Block-Coordinate Douglas-Rachford Splitting Method with Low Computational Complexity for Binary Logistic Regression

In this paper, we propose a new optimization algorithm for sparse logistic regression based on a stochastic version of the Douglas Rachford splitting method. Our algorithm sweeps the training set by randomly selecting a mini-batch of data at each iteration, and it allows us to update the variables in a block coordinate manner. Our approach … Read more

GEP-MSCRA for computing the group zero-norm regularized least squares estimator

This paper concerns with the group zero-norm regularized least squares estimator which, in terms of the variational characterization of the zero-norm, can be obtained from a mathematical program with equilibrium constraints (MPEC). By developing the global exact penalty for the MPEC, this estimator is shown to arise from an exact penalization problem that not only … Read more

A single potential governing convergence of conjugate gradient, accelerated gradient and geometric descent

Nesterov’s accelerated gradient (AG) method for minimizing a smooth strongly convex function $f$ is known to reduce $f({\bf x}_k)-f({\bf x}^*)$ by a factor of $\epsilon\in(0,1)$ after $k=O(\sqrt{L/\ell}\log(1/\epsilon))$ iterations, where $\ell,L$ are the two parameters of smooth strong convexity. Furthermore, it is known that this is the best possible complexity in the function-gradient oracle model of … Read more

Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper … Read more

An Algorithm for Piecewise Linear Optimization of Objective Functions in Abs-normal Form

In the paper [11] we derived first order (KKT) and second order (SSC) optimality conditions for functions defined by evaluation programs involving smooth elementals and absolute values. For this class of problems we showed in [12] that the natural algorithm of successive piecewise linear optimization with a proximal term (SPLOP) achieves a linear or even … Read more

Long-Step Path-Following Algorithm for Solving Symmetric Programming Problems with Nonlinear Objective Functions

We describe a long-step path-following algorithm for a class of symmetric programming problems with nonlinear convex objective functions. The complexity estimates similar to the case of a linear-quadratic objective function are established. The results of numerical experiments for the class of optimization problems involving quantum entropy are presented. CitationPreprint, University of Notre Dame, December 2017ArticleDownload … Read more

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity

We generalize the classic convergence rate theory for subgradient methods to apply to non-Lipschitz functions via a new measure of steepness. For the deterministic projected subgradient method, we derive a global $O(1/\sqrt{T})$ convergence rate for any function with at most exponential growth. Our approach implies generalizations of the standard convergence rates for gradient descent on … Read more

”Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?

Proximal gradient methods have been found to be highly effective for solving minimization problems with non-negative constraints or L1-regularization. Under suitable nondegeneracy conditions, it is known that these algorithms identify the optimal sparsity pattern for these types of problems in a finite number of iterations. However, it is not known how many iterations this may … Read more