The safe dissemination of statistical tabular data is one of the main concerns of National Statistical Institutes (NSIs). Although each cell of the tables is made up of the aggregated information of several individuals, the statistical confidentiality can be violated. NSIs must guarantee that no individual information can be derived from the released tables. One widely used type of methods to reduce the disclosure risk is based on the perturbation of the cell values. We consider a new controlled perturbation method which, given a set of tables to be protected, finds the closest safe ones—thus reducing the information loss while preserving confidentiality. This approach means solving a quadratic optimization problem with a much larger number of variables than constraints. Real instances can provide problems with millions of variables. We show that interior-point methods are an effective choice for that model, and, also, that specialized algorithms which exploit the problem structure can be faster than state-of-the art general solvers. Computational results are presented for instances of up to 1000000 variables.
Citation
Research Report DR 2003/10, Dept. of Statistics and Operations Research, Universitat Politècnica de Catalunya, May 2003