The minimum sum-of-squares clustering problem (MSSC) consists in partitioning n observations into k clusters in order to minimize the sum of squared distances from the points to the centroid of their cluster. In this paper, we propose an exact algorithm for the MSSC problem based on the branch-and-bound technique. The lower bound is computed by using a cutting-plane procedure where valid inequalities are iteratively added to the Peng-Wei SDP relaxation. The upper bound is computed with the constrained version of k-means where the initial centroids are extracted from the solution of the SDP relaxation. In the branch-and-bound procedure, we incorporate instance-level must-link and cannot-link constraints to express knowledge about which instances should or should not be grouped together. We manage to reduce the size of the problem at each level preserving the structure of the SDP problem itself. The obtained results show that the approach allows to successfully solve for the first time real-world instances up to 4000 data points.
Citation
Univ. Rome Tor Vergata, April 2021
Article
View SOS-SDP: an Exact Solver for Minimum Sum-of-Squares Clustering