On the Quality of a Semidefinite Programming Bound for Sparse Principal Component Analysis

We examine the problem of approximating a positive, semidefinite matrix $\Sigma$ by a dyad $xx^T$, with a penalty on the cardinality of the vector $x$. This problem arises in sparse principal component analysis, where a decomposition of $\Sigma$ involving sparse factors is sought. We express this hard, combinatorial problem as a maximum eigenvalue problem, in … Read more

Spectral Bounds for Sparse PCA: Exact & Greedy Algorithms

Sparse PCA seeks approximate sparse “eigenvectors” whose projections capture the maximal variance of data. As a cardinality-constrained and non-convex optimization problem, it is NP-hard and yet it is encountered in a wide range of applied fields, from bio-informatics to finance. Recent progress has focused mainly on continuous approximation and convex relaxation of the hard cardinality … Read more

Sparse Covariance Selection via Robust Maximum Likelihood Estimation

We address a problem of covariance selection, where we seek a trade-off between a high likelihood against the number of non-zero elements in the inverse covariance matrix. We solve a maximum likelihood problem with a penalty term given by the sum of absolute values of the elements of the inverse covariance matrix, and allow for … Read more

A Framework for Kernel Regularization with Applications to Protein Clustering

We develop and apply a novel framework which is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and the fitted kernel provides … Read more

A Tabu Search Algorithm for Partitioning

We present an original method for partitioning by automatic classi- fication, using the optimization technique of tabu search. The method uses a classical tabu search scheme based on transfers for the minimization of the within variance; it introduces in the tabu list the indicator of the object transfered. This method is compared with two stochastic … Read more

Optimal expected-distance separating halfspace

One recently proposed criterion to separate two datasets in discriminant analysis, is to use a hyperplane which minimises the sum of distances to it from all the misclassified data points. Here all distances are supposed to be measured by way of some fixed norm, while misclassification means lying on the wrong side of the hyperplane, … Read more

A direct formulation for sparse PCA using semidefinite programming

We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification … Read more

When LP is not a good idea – using structure in polyhedral optimization problems

It has been known for almost 50 years that the discrete l_1 approximation problem can be solved effectively by linear programming. However, improved algorithms involve a step which can be interpreted as a line search, and which is not part of the standard LP solution procedures. l_1 provides the simplest example of a class of … Read more

A New Computational Approach to Density Estimation with Semidefinite Programming

Density estimation is a classical and important problem in statistics. The aim of this paper is to develop a new computational approach to density estimation based on semidefinite programming (SDP), a new technology developed in optimization in the last decade. We express a density as the product of a nonnegative polynomial and a base density … Read more

Quadratic interior-point methods in statistical disclosure control

The safe dissemination of statistical tabular data is one of the main concerns of National Statistical Institutes (NSIs). Although each cell of the tables is made up of the aggregated information of several individuals, the statistical confidentiality can be violated. NSIs must guarantee that no individual information can be derived from the released tables. One … Read more