This work presents a novel optimization approach for training linear classifiers in multiclass classification tasks, when focusing on a regularized and smooth Weston-Watkins support vector machine (SVM) model. We propose a Majorization-Minimization (MM) algorithm to solve the resulting, Lipschitz-differentiable, optimization problem. To enhance scalability of the algorithm when tackling large datasets, we introduce an incremental MM strategy that suitably integrates the second-order information from the MM strategy within a low-complexity incremental gradient scheme. We establish convergence guarantees of the algorithm for both convex and non-convex settings and demonstrate its effectiveness through various numerical experiments.
In particular, by incorporating kernel principal component analysis and foundation models at preprocessing time, we demonstrate that optimizing a linear multiclass SVM using the proposed incremental MM scheme achieves results comparable to state-of-the-art deep learning methods on benchmark tasks.