Missing Value Imputation via Mathematical Optimization with Instance-and-Feature Neighborhoods

Datasets collected for analysis often contain a certain amount of incomplete instances, where some feature values are missing. Since many statistical analyses and machine learning algorithms depend on complete datasets, missing values need to be imputed in advance. Bertsimas et al. (2018) proposed a high-performance method that combines machine learning and mathematical optimization algorithms for imputing missing values. We extensively revise this imputation method based on the nearest neighbors algorithm by using not only neighborhoods of data instances but also neighborhoods of features. Specifically, we first formulate an optimization model using the instance-and-feature neighborhoods for missing value imputation. We next design an alternating optimization algorithm to find high-quality solutions to our optimization model for missing value imputation. We also develop a warm-start strategy to efficiently find a sequence of solutions for various neighborhood sizes. Experimental results demonstrate the excellent imputation accuracy of our method with instance-and-feature neighborhoods and the computational efficiency of our alternating optimization algorithm with the warm-start strategy.

Article

Download

View PDF