The widely used Support Vector Machine (SVM) method has shown to yield good results in Supervised Classification problems. Other methods such as Classification Trees have become more popular among practitioners than SVM thanks to their interpretability, which is an important issue in Data Mining. In this work, we propose an SVM-based method that automatically detects the most important predictor variables, and those values which are critical for the classification. Its classification ability is comparable to the standard linear SVM and clearly better than Classification Trees. Moreover, the proposed method is robust, i.e., it is stable in the presence of outliers and invariant to change of scale or measurement units of the predictor variables. The method involves the optimization of a Linear Programming problem with a large number of decision variables, for which we use the well-known Column Generation technique. When the classification rule obtained is too complex to allow interpretability, a wrapper feature selection method is applied, yielding a classification rule whose behavior slightly differs from linear SVM and still remains better than Classification Trees.
View A Column Generation Approach for Support Vector Machines