Expected Value of Matrix Quadratic Forms with Wishart distributed Random Matrices

To explore the limits of a stochastic gradient method, it may be useful to consider an example consisting of an infinite number of quadratic functions. In this context, it is appropriate to determine the expected value and the covariance matrix of the stochastic noise, i.e. the difference of the true gradient and the approximated gradient … Read more

A polyhedral study of multivariate decision trees

Decision trees are a widely used tool for interpretable machine learning. Multivariate decision trees employ hyperplanes at the branch nodes to route datapoints throughout the tree and yield more compact models than univariate trees. Recently, mixed-integer programming (MIP) has been applied to formulate the optimal decision tree problem. To strengthen MIP formulations, it is crucial … Read more

Wasserstein Regularization for 0-1 Loss

Wasserstein distributionally robust optimization (DRO) finds robust solutions by hedging against data perturbation specified by distributions in a Wasserstein ball. The robustness is linked to the regularization effect, which has been studied for continuous losses in various settings. However, existing results cannot be simply applied to the 0-1 loss, which is frequently seen in uncertainty … Read more

Sparse PCA With Multiple Components

Sparse Principal Component Analysis is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. At its heart, this involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing work address sparse PCA via heuristics … Read more

Optimized convergence of stochastic gradient descent by weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the … Read more