# A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

Machine learning theory typically assumes that training data is unbiased and not adversarially generated. When real training data deviates from these assumptions, trained models make erroneous predictions, sometimes with disastrous effects. Robust losses, such as the huber norm are designed to mitigate the effects of such contaminated data, but they are limited to the regression context. In this paper, we show how to transform any optimization problem that arises from fitting a machine learning model into one that (1) detects and removes contaminated data from the training set while (2) simultaneously fitting the trimmed model on the uncontaminated data that remains. To solve the resulting nonconvex optimization problem, we introduce a fast stochastic proximal-gradient algorithm that incorporates prior knowledge through nonsmooth regularization. In general, our approach requires $O(n^{2/3}/\varepsilon)$ gradient evaluations to reach $\varepsilon$-accuracy and, when a certain error bound hold, the complexity improves to $O(\kappa n^{2/3}\log(1/\varepsilon))$. These rates are $n^{1/3}$ times better than those achieved by typical, full gradient methods.