High-Rank Matrix Completion by Integer Programming

In the High-Rank Matrix Completion (HRMC) problem, we are given a collection of n data points, arranged into columns of a matrix X, and each of the data points is observed only on a subset of its coordinates. The data points are assumed to be concentrated near a union of low-dimensional subspaces. The goal of HRMC is to recover the missing elements of the data matrix X. State-of-the-art algorithms for HRMC can fail on instances with a large amount of missing data or if the data matrix X is nearly full-rank. We propose a novel integer programming based approach for HRMC. The approach is based on dynamically determining a set of candidate subspaces and optimally assigning points to selected subspaces. The problem structure is identical to the classical facility-location problem, with subspaces playing the role of facilities and data points that of customers. We propose a column-generation approach for identifying candidate subspaces combined with a Benders decomposition approach for solving the linear programming relaxation of the formulation. An empirical study demonstrates that the proposed approach can achieve better clustering accuracy than state-of-the-art methods when the data is high-rank, the percentage of missing data is high, or there are a small number of data points in each subspace.



View High-Rank Matrix Completion by Integer Programming