Robust Regression over Averaged Uncertainty

We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We first prove the equivalence for four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide closed-form formulations of the penalty term as a function of the sample size, feature size, as well as perturbation protection strength. We then show in synthetic datasets with different levels of perturbations, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. Importantly, as the perturbation level increases, the improvement increases, confirming our method's advantage in high-noise environments. We report similar improvements in the out-of-sample datasets
in real-world regression problems obtained from UCI datasets.



View Robust Regression over Averaged Uncertainty