Data quality in population surveys suffers from missing responses. We use combinatorial optimization to create a complete and coherent data set. The methods are based on the widespread nearest neighbor hot deck imputation method that replaces the missing values with observed values from a close unit, the so-called donor. As a repeated use of donors may lead to distortions in the statistics, an additional constraint on the maximum number of donor re-uses is introduced. It is shown how this problem can be modeled as a maximum weighted b-matching problem. Based on this theoretical analysis, we implement a cost scaling, network simplex and successive shortest path algorithm for computing a globally optimal solution. These outperform the available and presently used implementations in terms of solution quality and running time. Statistical imputation is presented as a novel application of Operations Research, the available methodology is improved by a concise theoretical treatment, and fast implementations are provided.
Article
View Exact solution of the donor-limited nearest neighbor hot deck imputation problem