A stochastic alternating balance k-means algorithm for fair clustering

In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement recommendations, the clustering outcome might discriminate against people across different demographic groups, leading to unfairness. A natural conflict occurs between the cost of clustering (in terms of distance to cluster centers) and the balance representation of all demographic groups … Read more

Enclosing Machine Learning

This report introduces a new machine learning paradigm called enclosing machine learning for data mining. This novel method utilizes the virtues of human being’s cognition process and tries to imitate the two basic principles of cognition process from a macroscopical view, which are cognizing things of the same kind, recognizing things of a new kind … Read more

Detecting relevant variables and interactions for classification in Support Vector Machines

The widely used Support Vector Machine (SVM) method has shown to yield good results in Supervised Classification problems. The Binarized SVM (BSVM) is a variant which is able to automatically detect which variables are, by themselves, most relevant for the classifier. In this work, we extend the BSVM introduced by the authors to a method … Read more

A Column Generation Approach for Support Vector Machines

The widely used Support Vector Machine (SVM) method has shown to yield good results in Supervised Classification problems. Other methods such as Classification Trees have become more popular among practitioners than SVM thanks to their interpretability, which is an important issue in Data Mining. In this work, we propose an SVM-based method that automatically detects … Read more

A Tabu Search Algorithm for Partitioning

We present an original method for partitioning by automatic classi- fication, using the optimization technique of tabu search. The method uses a classical tabu search scheme based on transfers for the minimization of the within variance; it introduces in the tabu list the indicator of the object transfered. This method is compared with two stochastic … Read more

A Mixed-Integer Programming Approach to Multi-Class Data Classification Problem

This paper presents a new data classification method based on mixed-integer programming. Traditional approaches that are based on partitioning the data sets into two groups perform poorly for multi-class data classification problems. The proposed approach is based on the use of hyper-boxes for defining boundaries of the classes that include all or some of the … Read more

Computation of Minimum Volume Covering Ellipsoids

We present a practical algorithm for computing the minimum volume n-dimensional ellipsoid that must contain m given points a_1, …, a_m \in R^n. This convex constrained problem arises in a variety of applied computational settings, particularly in data mining and robust statistics. Its structure makes it particularly amenable to solution by interior-point methods, and it … Read more

Semismooth Support Vector Machines

The linear support vector machine can be posed as a quadratic program in a variety of ways. In this paper, we look at a formulation using the two-norm for the misclassification error that leads to a positive definite quadratic program with a single equality constraint when the Wolfe dual is taken. The quadratic term is … Read more