On the power of linear programming for K-means clustering

In a previous work, we introduced a new linear programming (LP) relaxation for K-means clustering.
In this paper, we further investigate the theoretical properties of this relaxation. We focus on
K-means clustering with two clusters, which is an NP-hard problem. As evident from our numerical
experiments with both synthetic and real-world data sets, the proposed LP relaxation is almost
always tight; i.e., its optimal solution is feasible for the original nonconvex problem. To better
understand this unexpected behaviour, we obtain sufficient conditions under which the LP relaxation
is tight. We further analyze the sufficient conditions when the input is generated according
to a popular stochastic model and obtain recovery guarantees for the LP relaxation. Finally, we
construct a family of inputs for which the LP relaxation is never tight.

Article

Download

View PDF