Minimization over the l1-ball using an active-set non-monotone projected gradient

The l1-ball is a nicely structured feasible set that is widely used in many fields (e.g., machine learning, statistics and signal analysis) to enforce some sparsity in the model solutions. In this paper, we devise an active-set strategy for efficiently dealing with minimization problems over the l1-ball and embed it into a tailored algorithmic scheme … Read more

Rank-one Convexification for Sparse Regression

Sparse regression models are increasingly prevalent due to their ease of interpretability and superior out-of-sample performance. However, the exact model of sparse regression with an L0 constraint restricting the support of the estimators is a challenging non-convex optimization problem. In this paper, we derive new strong convex relaxations for sparse regression. These relaxations are based … Read more

Sparse and Smooth Signal Estimation: Convexification of L0 Formulations

Signal estimation problems with smoothness and sparsity priors can be naturally modeled as quadratic optimization with L0-“norm” constraints. Since such problems are non-convex and hard-to-solve, the standard approach is, instead, to tackle their convex surrogates based on L1-norm relaxations. In this paper, we propose new iterative conic quadratic relaxations that exploit not only the L0-“norm” … Read more

A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems

We develop a fast and robust algorithm for solving large scale convex composite optimization models with an emphasis on the $\ell_1$-regularized least squares regression (Lasso) problems. Despite the fact that there exist a large number of solvers in the literature for the Lasso problems, we found that no solver can efficiently handle difficult large scale … Read more

From error bounds to the complexity of first-order descent methods for convex functions

This paper shows that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the Kurdyka-\L ojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex … Read more

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

In this paper we analyze boosting algorithms in linear regression from a new perspective: that of modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm (FS-epsilon) and least squares boosting (LS-Boost-epsilon), can be viewed as subgradient descent to minimize the loss function defined … Read more

Object-Parallel Infrastructure for Implementing First-Order Methods, with an Example Application to LASSO

We describe the design of a C++ vector-manipulation substrate that allows first-order optimization algorithms to be expressed in a concise and readable manner, yet still achieve high performance in parallel computing environments. We use standard object-oriented techniques of encapsulation and operator overloading, combined with a novel “symbolic temporaries” delayed-evaluation system that greatly reduces the overhead … Read more

Parallel Algorithms for Big Data Optimization

We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (i.e., sequential) ones, as … Read more

Parallel Coordinate Descent Methods for Big Data Optimization

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function. The theoretical speedup, as compared to the serial method, and referring to the number of iterations needed … Read more

Safe Feature Elimination in Sparse Supervised Learning

We investigate fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a l1 -norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm. The methods are not heuristic: they only eliminate features that are guaranteed … Read more