In this paper we present novel data-driven optimization models for Support Vector Machines (SVM), with the aim of linearly separating two sets of points that have non-disjoint convex closures. Traditional classication algorithms assume that the training data points are always known exactly. However, real-life data are often subject to noise. To handle such uncertainty, we formulate robust models with uncertainty sets in the form of hyperrectangles or hyperellipsoids, and propose distributionally robust optimization models enforcing limits on observations first-order deviations along principal directions. All the formulations reduce to convex programs. The efficiency of the new classiers is evaluated on real-world databases. Experiments show that robust classiers are especially benecial for data sets with a small number of observations. As the dimension of the data sets increases, features behavior is gradually learned and higher levels of out-of-sample accuracy can be achieved via distributionally robust optimization methods. The proposed formulations, overall, allow finding a trade-off between increasing the average performance accuracy and protecting against uncertainty, with respect to deterministic approaches.
Citation
Accepted on June 2022 in Computers and Operations Research. Manuscript number: CAOR_105930