The field of Statistical Disclosure Control aims at reducing the risk of re-identification of an individual when disseminating data, and it is one of the main concerns of national statistical agencies. Operations Research (OR) techniques were widely used in the past for the protection of tabular data, but not for microdata (i.e., files of individuals and attributes). This work presents (as far as we know, for the first time) an application of OR techniques for the microaggregation problem, which is considered one the best methods for microdata protection and it is known to be NP-hard. The new heuristic approach is based on a column generation scheme and, unlike previous (primal) heuristics for microaggregation, it also provides a lower bound on the optimal microaggregation. Computational results on real data typically used in the literature show that solutions with small gaps are often achieved and that dramatic improvements are obtained with respect to the most popular heuristics in the literature.
IASI Research Report 20-02 www.iasi.cnr.it
View An algorithm for the Microaggregation problem using Column Generation