Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

We introduce a distributionally robust minimium mean square error estimation model with a Wasserstein ambiguity set to recover an unknown signal from a noisy observation. The proposed model can be viewed as a zero-sum game between a statistician choosing an estimator—that is, a measurable function of the observation—and a fictitious adversary choosing a prior—that is, … Read more

Stochastic Discrete First-order Algorithm for Feature Subset Selection

This paper addresses the problem of selecting a significant subset of candidate features to use for multiple linear regression. Bertsimas et al. (2016) recently proposed the discrete first-order (DFO) algorithm to efficiently find near-optimal solutions to this problem. However, this algorithm is unable to escape from locally optimal solutions. To resolve this, we propose a … Read more

On Polyhedral and Second-Order-Cone Decompositions of Semidefinite Optimization Problems

We study a cutting-plane method for semidefinite optimization problems (SDOs), and supply a proof of the method’s convergence, under a boundedness assumption. By relating the method’s rate of convergence to an initial outer approximation’s diameter, we argue that the method performs well when initialized with a second-order-cone approximation, instead of a linear approximation. We invoke … Read more

Exact solution of the donor-limited nearest neighbor hot deck imputation problem

Data quality in population surveys suffers from missing responses. We use combinatorial optimization to create a complete and coherent data set. The methods are based on the widespread nearest neighbor hot deck imputation method that replaces the missing values with observed values from a close unit, the so-called donor. As a repeated use of donors … Read more

Computing Estimators of Dantzig Selector type via Column and Constraint Generation

We consider a class of linear-programming based estimators in reconstructing a sparse signal from linear measurements. Specific formulations of the reconstruction problem considered here include Dantzig selector, basis pursuit (for the case in which the measurements contain no errors), and the fused Dantzig selector (for the case in which the underlying signal is piecewise constant). … Read more

A mixed-integer optimization approach to an exhaustive cross-validated model selection for regression

We consider a linear regression model for which we assume that many of the observed regressors are irrelevant for the prediction. To avoid overfitting, we conduct a variable selection and only include the true predictors for the least square fitting. The best subset selection gained much interest in recent years for addressing this objective. For … Read more

Integer Programming for Learning Directed Acyclic Graphs from Continuous Data

Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a … Read more

An Alternating Manifold Proximal Gradient Method for Sparse PCA and Sparse CCA

Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve … Read more

High-Order Evaluation Complexity for Convexly-Constrained Optimization with Non-Lipschitzian Group Sparsity Terms

This paper studies high-order evaluation complexity for partially separable convexly-constrained optimization involving non-Lipschitzian group sparsity terms in a nonconvex objective function. We propose a partially separable adaptive regularization algorithm using a $p$-th order Taylor model and show that the algorithm can produce an (epsilon,delta)-approximate q-th-order stationary point in at most O(epsilon^{-(p+1)/(p-q+1)}) evaluations of the objective … Read more

Recovery of a mixture of Gaussians by sum-of-norms clustering

Sum-of-norms clustering is a method for assigning $n$ points in $\R^d$ to $K$ clusters, $1\le K\le n$, using convex optimization. Recently, Panahi et al.\ proved that sum-of-norms clustering is guaranteed to recover a mixture of Gaussians under the restriction that the number of samples is not too large. The purpose of this note is to … Read more