Wasserstein Distributionally Robust Optimization and Variation Regularization

Wasserstein distributionally robust optimization (DRO) has recently achieved empirical success for various applications in operations research and machine learning, owing partly to its regularization effect. Although the connection between Wasserstein DRO and regularization has been established in several settings, existing results often require restrictive assumptions, such as smoothness or convexity, that are not satisfied by … Read more

Using Regularization and Second Order Information in Outer Approximation for Convex MINLP

In this paper, we present two new methods for solving convex mixed-integer nonlinear programming problems based on the outer approximation method. The first method is inspired by the level method and uses a regularization technique to reduce the step size when choosing new integer combinations. The second method combines ideas from both the level method … Read more

Convergence rates of Forward-Douglas-Rachford splitting method

Over the past years, operator splitting methods have become ubiquitous for non-smooth optimization owing to their simplicity and efficiency. In this paper, we consider the Forward–Douglas–Rachford splitting method (FDR) [10, 40], and study both global and local convergence rates of this method. For the global rate, we establish an o(1/k) convergence rate in terms of … Read more

Binary Extended Formulations of Polyhedral Mixed-integer Sets

We analyze different ways of constructing binary extended formulations of polyhedral mixed-integer sets with bounded integer variables and compare their relative strength with respect to split cuts. We show that among all binary extended formulations where each bounded integer variable is represented by a distinct collection of binary variables, what we call “unimodular” extended formulations … Read more

Optimal Decision Trees for Categorical Data via Integer Programming

Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a novel mixed integer programming formulation … Read more

Extended formulations for convex hulls of some bilinear functions

We consider the problem of characterizing the convex hull of the graph of a bilinear function $f$ on the $n$-dimensional unit cube $[0,1]^n$. Extended formulations for this convex hull are obtained by taking subsets of the facets of the Boolean Quadric Polytope (BQP). Extending existing results, we propose the systematic study of properties of $f$ … Read more

GEP-MSCRA for computing the group zero-norm regularized least squares estimator

This paper concerns with the group zero-norm regularized least squares estimator which, in terms of the variational characterization of the zero-norm, can be obtained from a mathematical program with equilibrium constraints (MPEC). By developing the global exact penalty for the MPEC, this estimator is shown to arise from an exact penalization problem that not only … Read more

A Random Block-Coordinate Douglas-Rachford Splitting Method with Low Computational Complexity for Binary Logistic Regression

In this paper, we propose a new optimization algorithm for sparse logistic regression based on a stochastic version of the Douglas Rachford splitting method. Our algorithm sweeps the training set by randomly selecting a mini-batch of data at each iteration, and it allows us to update the variables in a block coordinate manner. Our approach … Read more

Smart “Predict, then Optimize”

Many real-world analytics problems involve two significant challenges: prediction and optimization. Due to the typically complex nature of each challenge, the standard paradigm is to predict, then optimize. By and large, machine learning tools are intended to minimize prediction error and do not account for how the predictions will be used in a downstream optimization … Read more

A single potential governing convergence of conjugate gradient, accelerated gradient and geometric descent

Nesterov’s accelerated gradient (AG) method for minimizing a smooth strongly convex function $f$ is known to reduce $f({\bf x}_k)-f({\bf x}^*)$ by a factor of $\epsilon\in(0,1)$ after $k=O(\sqrt{L/\ell}\log(1/\epsilon))$ iterations, where $\ell,L$ are the two parameters of smooth strong convexity. Furthermore, it is known that this is the best possible complexity in the function-gradient oracle model of … Read more