k-means clustering – Optimization Online

On the power of linear programming for K-means clustering

Published: 2024/02/03

Aida Khajavirad

Combinatorial Optimization, Integer Programming, Polyhedra k-means clustering, linear programming relaxation, Ratio-cut polytope, recovery guarantee, Tightness

In a previous work, we introduced a new linear programming (LP) relaxation for K-means clustering. In this paper, we further investigate the theoretical properties of this relaxation. We focus on K-means clustering with two clusters, which is an NP-hard problem. As evident from our numerical experiments with both synthetic and real-world data sets, the proposed … Read more

A stochastic alternating balance k-means algorithm for fair clustering

Published: 2021/06/02

Convex and Nonsmooth Optimization, Data-Mining, Nonlinear Optimization bi-objective optimization, data mining, fairness, k-means clustering, pareto front, unsupervised machine learning

In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement recommendations, the clustering outcome might discriminate against people across different demographic groups, leading to unfairness. A natural conflict occurs between the cost of clustering (in terms of distance to cluster centers) and the balance representation of all demographic groups … Read more

Robustification of the k-Means Clustering Problem and Tailored Decomposition Methods: When More Conservative Means More Accurate

Published: 2020/05/17, Updated: 2022/04/27

(Mixed) Integer Nonlinear Programming, Robust Optimization alternating direction method, gamma-robustness, k-means clustering, robust optimization, strict robustness

k-means clustering is a classic method of unsupervised learning with the aim of partitioning a given number of measurements into k clusters. In many modern applications, however, this approach suffers from unstructured measurement errors because the k-means clustering result then represents a clustering of the erroneous measurements instead of retrieving the true underlying clustering structure. … Read more

Joint Pricing and Production: A Fusion of Machine Learning and Robust Optimization

Published: 2019/12/31

Applications - OR and Management Sciences, Robust Optimization distributionally robust optimization, inventory control, k-means clustering, multi-item, pricing

We integrate machine learning with distributionally robust optimization to address a two-period problem for the joint pricing and production of multiple items. First, we generalize the additive demand model to capture both cross-product and cross-period effects as well as the demand dependence across periods. Next, we apply K-means clustering to the demand residual mapping based … Read more

Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Published: 2017/05/22

Approximation Algorithms, Data-Mining, Linear, Cone and Semidefinite Programming k-means clustering, optimality guarantee, outlier detection, semidenite programming

Plain vanilla K-means clustering is prone to produce unbalanced clusters and suffers from outlier sensitivity. To mitigate both shortcomings, we formulate a joint outlier-detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset. We cast this problem as a mixed-integer … Read more

Scenario Reduction Revisited: Fundamental Limits and Guarantees

Published: 2017/01/15

Linear, Cone and Semidefinite Programming, Robust Optimization, Stochastic Programming constant-factor approximation algorithm, k-means clustering, k-median clustering, scenario reduction, wasserstein distance

The goal of scenario reduction is to approximate a given discrete distribution with another discrete distribution that has fewer atoms. We distinguish continuous scenario reduction, where the new atoms may be chosen freely, and discrete scenario reduction, where the new atoms must be chosen from among the existing ones. Using the Wasserstein distance as measure … Read more

Pseudo basic steps: Bound improvement guarantees from Lagrangian decomposition in convex disjunctive programming

Published: 2016/09/13, Updated: 2017/06/22

(Mixed) Integer Nonlinear Programming basic step, disjunctive programming, k-means clustering, lagrangian decomposition, mixed integer conic quadratic optimization

An elementary, but fundamental, operation in disjunctive programming is a basic step, which is the intersection of two disjunctions to form a new disjunction. Basic steps bring a disjunctive set in regular form closer to its disjunctive normal form and, in turn, produce relaxations that are at least as tight. An open question is: What … Read more

Approximating K-means-type clustering via semidefinite programming

Published: 2005/04/22, Updated: 2006/02/22

Approximation Algorithms, Convex Optimization, Data-Mining approximation, k-means clustering, principal component analysis, semidefinite programming

One of the fundamental clustering problems is to assign $n$ points into $k$ clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for … Read more