On the power of linear programming for K-means clustering

In a previous work, we introduced a new linear programming (LP) relaxation for K-means clustering. In this paper, we further investigate the theoretical properties of this relaxation. We focus on K-means clustering with two clusters, which is an NP-hard problem. As evident from our numerical experiments with both synthetic and real-world data sets, the proposed … Read more

A stochastic alternating balance k-means algorithm for fair clustering

In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement recommendations, the clustering outcome might discriminate against people across different demographic groups, leading to unfairness. A natural conflict occurs between the cost of clustering (in terms of distance to cluster centers) and the balance representation of all demographic groups … Read more

Robustification of the k-Means Clustering Problem and Tailored Decomposition Methods: When More Conservative Means More Accurate

k-means clustering is a classic method of unsupervised learning with the aim of partitioning a given number of measurements into k clusters. In many modern applications, however, this approach suffers from unstructured measurement errors because the k-means clustering result then represents a clustering of the erroneous measurements instead of retrieving the true underlying clustering structure. … Read more

Joint Pricing and Production: A Fusion of Machine Learning and Robust Optimization

We integrate machine learning with distributionally robust optimization to address a two-period problem for the joint pricing and production of multiple items. First, we generalize the additive demand model to capture both cross-product and cross-period effects as well as the demand dependence across periods. Next, we apply K-means clustering to the demand residual mapping based … Read more

Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Plain vanilla K-means clustering is prone to produce unbalanced clusters and suffers from outlier sensitivity. To mitigate both shortcomings, we formulate a joint outlier-detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset. We cast this problem as a mixed-integer … Read more

Scenario Reduction Revisited: Fundamental Limits and Guarantees

The goal of scenario reduction is to approximate a given discrete distribution with another discrete distribution that has fewer atoms. We distinguish continuous scenario reduction, where the new atoms may be chosen freely, and discrete scenario reduction, where the new atoms must be chosen from among the existing ones. Using the Wasserstein distance as measure … Read more

Pseudo basic steps: Bound improvement guarantees from Lagrangian decomposition in convex disjunctive programming

An elementary, but fundamental, operation in disjunctive programming is a basic step, which is the intersection of two disjunctions to form a new disjunction. Basic steps bring a disjunctive set in regular form closer to its disjunctive normal form and, in turn, produce relaxations that are at least as tight. An open question is: What … Read more

Approximating K-means-type clustering via semidefinite programming

One of the fundamental clustering problems is to assign $n$ points into $k$ clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for … Read more