Exact and Approximation Algorithms for Sparse PCA

Sparse Principal Component Analysis (SPCA) is designed to enhance the interpretability of traditional Principal Component Analysis (PCA) by optimally selecting a subset of features that comprise the first principal component. Given the NP-hard nature of SPCA, most current approaches resort to approximate solutions, typically achieved through tractable semidefinite programs (SDPs) or heuristic methods. To solve SPCA to optimality, we propose two exact mixed-integer SDPs (MISDPs) and an arbitrarily equivalent mixed-integer linear program (MILP). The MISDPs allow us to design an effective branch-and-cut algorithm with closed-form cuts that do not need to solve dual problems. For the proposed mixed-integer formulations, we further derive the theoretical optimality gaps of their continuous relaxations.
Besides, we apply the greedy and local search algorithms to solving SPCA and derive their first-known approximation ratios. Our numerical experiments reveal that the exact methods we developed can efficiently find optimal solutions for datasets containing hundreds of features. Furthermore, our approximation algorithms demonstrate both scalability and near-optimal performance when benchmarked on larger datasets, specifically those with thousands of features.

Article

Download