Statistics – Page 18 – Optimization Online

On the Quality of a Semidefinite Programming Bound for Sparse Principal Component Analysis

Published: 2006/03/10

Robust Optimization, Semi-definite Programming, Statistics quality of relaxation, semidefinite programming, sparse principal component analysis, unsupervised machine learning

We examine the problem of approximating a positive, semidefinite matrix $\Sigma$ by a dyad $xx^T$, with a penalty on the cardinality of the vector $x$. This problem arises in sparse principal component analysis, where a decomposition of $\Sigma$ involving sparse factors is sought. We express this hard, combinatorial problem as a maximum eigenvalue problem, in … Read more

Spectral Bounds for Sparse PCA: Exact & Greedy Algorithms

Published: 2006/02/17

Shai Avidan

Baback Moghaddam

Yair Weiss

Combinatorial Optimization, Finance and Economics, Statistics courant-fischer theorem, eigenvalue bounds, inclusion principle

Sparse PCA seeks approximate sparse “eigenvectors” whose projections capture the maximal variance of data. As a cardinality-constrained and non-convex optimization problem, it is NP-hard and yet it is encountered in a wide range of applied fields, from bio-informatics to finance. Recent progress has focused mainly on continuous approximation and convex relaxation of the hard cardinality … Read more

Sparse Covariance Selection via Robust Maximum Likelihood Estimation

Published: 2005/07/14

Onureena Banerjeee

Alexandre d'Aspremont

Laurent El Ghaoui

Convex Optimization, Semi-definite Programming, Statistics covariance selection, first-order methods, semidefinite programming

We address a problem of covariance selection, where we seek a trade-off between a high likelihood against the number of non-zero elements in the inverse covariance matrix. We solve a maximum likelihood problem with a penalty term given by the sum of absolute values of the elements of the inverse covariance matrix, and allow for … Read more

A Framework for Kernel Regularization with Applications to Protein Clustering

Published: 2005/05/06

Biomedical Applications, Linear, Cone and Semidefinite Programming, Statistics classification, convex cone programming, globin family, noisy dissimilarity data, positive definite matrices, protein clustering, regularized kernel estimation, support vector machines

We develop and apply a novel framework which is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and the fitted kernel provides … Read more

A Tabu Search Algorithm for Partitioning

Published: 2004/12/03

Alex Murillo

Eduardo Piza

Javier Trejos

Data-Mining, Meta Heuristics, Statistics clustering, data mining, metaheuristics, tabu search

We present an original method for partitioning by automatic classi- fication, using the optimization technique of tabu search. The method uses a classical tabu search scheme based on transfers for the minimization of the within variance; it introduces in the tabu list the indicator of the object transfered. This method is compared with two stochastic … Read more

Optimal expected-distance separating halfspace

Published: 2004/10/05

Emilio Carrizosa

Frank Plastria

Data-Mining, Global Optimization Applications, Statistics discriminant analysis, norm-distance to hyperplane, separating halfspace

One recently proposed criterion to separate two datasets in discriminant analysis, is to use a hyperplane which minimises the sum of distances to it from all the misclassified data points. Here all distances are supposed to be measured by way of some fixed norm, while misclassification means lying on the wrong side of the hyperplane, … Read more

A direct formulation for sparse PCA using semidefinite programming

Published: 2004/07/07

Alexandre d'Aspremont

Laurent El Ghaoui

Michael I. Jordan

G. R. G. Lanckriet

Finance and Economics, Semi-definite Programming, Statistics pca, semidefinite programming, sparsity

We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification … Read more

When LP is not a good idea – using structure in polyhedral optimization problems

Published: 2003/12/10

Michael Osborne

Convex Optimization, Linear Programming, Statistics active set method, homotopy method, l_1 estimation, line-search, linear programming, rank regression, simplicial algorithm, structure functional, support vector regression, the lasso

It has been known for almost 50 years that the discrete l_1 approximation problem can be solved effectively by linear programming. However, improved algorithms involve a step which can be interpreted as a line search, and which is not part of the standard LP solution procedures. l_1 provides the simplest example of a class of … Read more

A New Computational Approach to Density Estimation with Semidefinite Programming

Published: 2003/11/29, Updated: 2003/12/19

Tadayoshi Fushiki

Takashi Tsuchiya

Shingo Horiuchi

Data-Mining, Semi-definite Programming, Statistics aic, density estimation, maximum likelihood estimation, semidefinite programming, statistics

Density estimation is a classical and important problem in statistics. The aim of this paper is to develop a new computational approach to density estimation based on semidefinite programming (SDP), a new technology developed in optimization in the last decade. We express a density as the product of a nonnegative polynomial and a base density … Read more

Quadratic interior-point methods in statistical disclosure control

Published: 2003/11/28

Jordi Castro

Network Optimization, Quadratic Programming, Statistics

The safe dissemination of statistical tabular data is one of the main concerns of National Statistical Institutes (NSIs). Although each cell of the tables is made up of the aggregated information of several individuals, the statistical confidentiality can be violated. NSIs must guarantee that no individual information can be derived from the released tables. One … Read more