Statistics – Page 5 – Optimization Online

Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

Published: 2019/11/08, Updated: 2019/11/12

Infinite Dimensional Optimization, Statistics, Stochastic Programming affine estimator, mean square error, wasserstein distance

We introduce a distributionally robust minimium mean square error estimation model with a Wasserstein ambiguity set to recover an unknown signal from a noisy observation. The proposed model can be viewed as a zero-sum game between a statistician choosing an estimator—that is, a measurable function of the observation—and a fictitious adversary choosing a prior—that is, … Read more

Stochastic Discrete First-order Algorithm for Feature Subset Selection

Published: 2019/10/09

Kota Kudo

Yuichi Takano

Ryo Nomura

Integer Programming, Statistics feature subset selection, linear regression, machine learning, optimization algorithm, statistics

This paper addresses the problem of selecting a significant subset of candidate features to use for multiple linear regression. Bertsimas et al. (2016) recently proposed the discrete first-order (DFO) algorithm to efficiently find near-optimal solutions to this problem. However, this algorithm is unable to escape from locally optimal solutions. To resolve this, we propose a … Read more

On Polyhedral and Second-Order-Cone Decompositions of Semidefinite Optimization Problems

Published: 2019/09/30, Updated: 2019/12/02

Dimitris Bertsimas

Ryan Cory-Wright

Linear, Cone and Semidefinite Programming, Semi-definite Programming, Statistics cutting planes, nuclear norm, semidefinite programming, sparse principal component analaysis

We study a cutting-plane method for semidefinite optimization problems (SDOs), and supply a proof of the method’s convergence, under a boundedness assumption. By relating the method’s rate of convergence to an initial outer approximation’s diameter, we argue that the method performs well when initialized with a second-order-cone approximation, instead of a linear approximation. We invoke … Read more

Exact solution of the donor-limited nearest neighbor hot deck imputation problem

Published: 2019/08/20

Graphs and Matroids, Network Optimization, Statistics b-matching, combinatorial optimization, missing data, nearest neighbor hot deck imputation, survey data

Data quality in population surveys suffers from missing responses. We use combinatorial optimization to create a complete and coherent data set. The methods are based on the widespread nearest neighbor hot deck imputation method that replaces the missing values with observed values from a close unit, the so-called donor. As a repeated use of donors … Read more

Computing Estimators of Dantzig Selector type via Column and Constraint Generation

Published: 2019/08/18

Rahul Mazumder

Stephen Wright

Andrew Zheng

Linear Programming, Statistics column-and-constraint generation, dantzig selection, simplex method

We consider a class of linear-programming based estimators in reconstructing a sparse signal from linear measurements. Specific formulations of the reconstruction problem considered here include Dantzig selector, basis pursuit (for the case in which the measurements contain no errors), and the fused Dantzig selector (for the case in which the underlying signal is piecewise constant). … Read more

A mixed-integer optimization approach to an exhaustive cross-validated model selection for regression

Published: 2019/05/03, Updated: 2019/09/06

Dennis Kreber

(Mixed) Integer Nonlinear Programming, Statistics best subset selection, bilevel optimization, cross-validation, mixed-integer quadratic programming, sparse regression

We consider a linear regression model for which we assume that many of the observed regressors are irrelevant for the prediction. To avoid overfitting, we conduct a variable selection and only include the true predictors for the least square fitting. The best subset selection gained much interest in recent years for addressing this objective. For … Read more

Integer Programming for Learning Directed Acyclic Graphs from Continuous Data

Published: 2019/04/20, Updated: 2020/06/19

(Mixed) Integer Linear Programming, Statistics continuous data, directed acyclic graph, integer programming, structural learning

Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a … Read more

An Alternating Manifold Proximal Gradient Method for Sparse PCA and Sparse CCA

Published: 2019/03/27

Constrained Nonlinear Optimization, Statistics

Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve … Read more

High-Order Evaluation Complexity for Convexly-Constrained Optimization with Non-Lipschitzian Group Sparsity Terms

Published: 2019/02/27

Xiaojun Chen

Philippe L. Toint

Constrained Nonlinear Optimization, Data-Mining, Statistics complexity, group sparsity, isotropic model, non-lipschitz functions, nonlinear optimization, partially-separable problems

This paper studies high-order evaluation complexity for partially separable convexly-constrained optimization involving non-Lipschitzian group sparsity terms in a nonconvex objective function. We propose a partially separable adaptive regularization algorithm using a $p$-th order Taylor model and show that the algorithm can produce an (epsilon,delta)-approximate q-th-order stationary point in at most O(epsilon^{-(p+1)/(p-q+1)}) evaluations of the objective … Read more

Recovery of a mixture of Gaussians by sum-of-norms clustering

Published: 2019/02/19

Tao Jiang

Stephen A. Vavasis

Chen Wen Zhai

Second-Order Cone Programming, Statistics

Sum-of-norms clustering is a method for assigning $n$ points in $\R^d$ to $K$ clusters, $1\le K\le n$, using convex optimization. Recently, Panahi et al.\ proved that sum-of-norms clustering is guaranteed to recover a mixture of Gaussians under the restriction that the number of samples is not too large. The purpose of this note is to … Read more