One of the main concerns of national statistical agencies (NSAs) is to publish tabular data. NSAs have to guarantee that no private information from specific respondents can be disclosed from the released tables. The purpose of the statistical disclosure control field is to avoid such a leak of private information. Most protection techniques for tabular data rely on the formulation of a large mathematical programming problem, whose solution is computationally expensive even for tables of moderate size. One of the emerging techniques in this field is controlled tabular adjustment (CTA). Although CTA is more efficient than other protection methods, the resulting mixed integer linear problems (MILP) are still challenging. In this work a heuristic approach based on block coordinate descent decomposition is designed and applied to large hierarchical and general CTA instances. This approach is compared with CPLEX, a state-of-the-art MILP solver. Our results, from both synthetic and real tables with up to 1,200,000 cells, 100,000 of them being sensitive (resulting in MILP instances of up to 2,400,000 continuous variables, 100,000 binary variables, and 475,000 constraints) show that the heuristic block coordinate descent has a better practical behaviour than a state-of-the-art solver: for large hierarchical instances it provides significantly better solutions within a specified realistic time limit, as required by NSAs in real-world.
Citation
J.A. González, J. Castro, A heuristic block coordinate descent approach for controlled tabular adjustment, Research Report DR 2010/06, Dept. of Statistics and Operations Research, Universitat Politècnica de Catalunya, 2010.
Article
View A heuristic block coordinate descent approach for controlled tabular adjustment