In optimization-based classification model selection, for example when using linear programming formulations, a standard approach is to penalize the L1 norm of some linear functional in order to select sparse models. Instead, we propose a novel integer linear program for sparse classifier selection, generalizing the minimum disagreement hyperplane problem whose complexity has been investigated in computational learning theory. Specifically, our mixed-integer problem is that of finding a separating hyperplane with minimum empirical error subject to an L0 penalty. We show that common "soft margin" linear programming formulations for robust classification are equivalent to a continuous relaxation of our model. Since the initial continuous relaxation is weak, we suggest a tighter relaxation, using novel cutting planes, to better approximate the integer solution. We describe a boosting algorithm, based on linear programming with dynamic generation of cuts and columns, that solves our relaxation. We demonstrate the classification performance of our proposed algorithm with experimental results, and justify our selection of parameters using a minimum description length, compression interpretation of learning.
Citation
RUTCOR Research Report #23-2009, Rutgers Center for Operations Research, 640 Bartholomew Rd., Piscataway NJ 08854, USA, October 2009