Non-asymptotic Results for Langevin Monte Carlo: Coordinate-wise and Black-box Sampling

Euler-Maruyama and Ozaki discretization of a continuous time diffusion process is a popular technique for sampling, that uses (upto) gradient and Hessian information of the density respectively. The Euler-Maruyama discretization has been used particularly for sampling under the name of Langevin Monte Carlo (LMC) for sampling from strongly log-concave densities. In this work, we make … Read more

Rank-one Convexification for Sparse Regression

Sparse regression models are increasingly prevalent due to their ease of interpretability and superior out-of-sample performance. However, the exact model of sparse regression with an L0 constraint restricting the support of the estimators is a challenging non-convex optimization problem. In this paper, we derive new strong convex relaxations for sparse regression. These relaxations are based … Read more

Fast Robust Methods for Singular State-Space Models

State-space models are used in a wide range of time series analysis applications. Kalman filtering and smoothing are work-horse algorithms in these settings. While classic algorithms assume Gaussian errors to simplify estimation, recent advances use a broad range of optimization formulations to allow outlier-robust estimation, as well as constraints to capture prior information. Here we … Read more

Best Subset Selection via Cross-validation Criterion

This paper is concerned with the cross-validation criterion for best subset selection in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ $C_p$, AIC, BIC, and various information criteria), the cross-validation only requires the mild assumptions, namely, samples are identically distributed, and training and validation samples are independent. For this … Read more

A Unified Framework for Sparse Relaxed Regularized Regression: SR3

Regularized regression problems are ubiquitous in statistical modeling, signal processing, and machine learning. Sparse regression in particular has been instrumental in scientific model discovery, including compressed sensing applications, vari- able selection, and high-dimensional analysis. We propose a broad framework for sparse relaxed regularized regression, called SR3. The key idea is to solve a relaxation of … Read more

Strong mixed-integer programming formulations for trained neural networks

We present strong mixed-integer programming (MIP) formulations for high-dimensional piecewise linear functions that correspond to trained neural networks. These formulations can be used for a number of important tasks, such as verifying that an image classification network is robust to adversarial inputs, or solving decision problems where the objective function is a machine learning model. … Read more

Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods

Logistic regression is one of the most popular methods in binary classification, wherein estimation of model parameters is carried out by solving the maximum likelihood (ML) optimization problem, and the ML estimator is defined to be the optimal solution of this problem. It is well known that the ML estimator exists when the data is … Read more

Learning a Mixture of Gaussians via Mixed Integer Optimization

We consider the problem of estimating the parameters of a multivariate Gaussian mixture model (GMM) given access to $n$ samples $\x_1,\x_2,\ldots ,\x_n \in\mathbb{R}^d$ that are believed to have come from a mixture of multiple subpopulations. State-of-the-art algorithms used to recover these parameters use heuristics to either maximize the log-likelihood of the sample or try to … Read more

Empirical Bounds on Linear Regions of Deep Rectifier Networks

One form of characterizing the expressiveness of a piecewise linear neural network is by the number of linear regions, or pieces, of the function modeled. We have observed substantial progress in this topic through lower and upper bounds on the maximum number of linear regions and a counting procedure. However, these bounds only account for … Read more

Wasserstein Distributionally Robust Kalman Filtering

We study a distributionally robust mean square error estimation problem over a nonconvex Wasserstein ambiguity set containing only normal distributions. We show that the optimal estimator and the least favorable distribution form a Nash equilibrium. Despite the non-convex nature of the ambiguity set, we prove that the estimation problem is equivalent to a tractable convex … Read more