Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

In this work, we instantiate a regularized form of the gradient clipping algorithm and prove that it can converge to the global minima of deep neural network loss functions provided that the net is of sufficient width. We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art … Read more

Concrete convergence rates for common fixed point problems under Karamata regularity

We introduce the notion of Karamata regular operators, which is a notion of regularity that is suitable for obtaining concrete convergence rates for common fixed point problems. This provides a broad framework that includes, but goes beyond, Hölderian error bounds and Hölder regular operators. By concrete, we mean that the rates we obtain are explicitly … Read more

Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM

We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying matrix. This problem consists of an important generalization of the Matrix Completion problem, a central problem in Statistics, Operations Research and Machine Learning, that arises … Read more

Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls

Adversarially robust optimization (ARO) has emerged as the *de facto* standard for training models that hedge against adversarial attacks in the test stage. While these models are robust against adversarial attacks, they tend to suffer severely from overfitting. To address this issue, some successful methods replace the empirical distribution in the training stage with alternatives … Read more

Wasserstein Distributionally Robust Optimization with Heterogeneous Data Sources

We study decision problems under uncertainty, where the decision-maker has access to K data sources that carry biased information about the underlying risk factors. The biases are measured by the mismatch between the risk factor distribution and the K data-generating distributions with respect to an optimal transport (OT) distance. In this situation the decision-maker can … Read more

A Decomposition Algorithm for Distributionally Robust Chance-Constrained Programs with Polyhedral Ambiguity Set

In this paper, we study a distributionally robust optimization approach to chance-constrained stochastic programs to hedge against uncertainty in the distributions of the random parameters. We consider a general polyhedral ambiguity set under finite support and study Wasserstein ambiguity set, total variation distance ambiguity set, and moment-based ambiguity set as examples for our computations. We … Read more

Contextual Stochastic Programs with Expected-Value Constraints

Expected-value-constrained programming (ECP) formulations are a broad class of stochastic programming problems including integrated chance constraints, risk models, and stochastic dominance formulations. Given the wide availability of data, it is common in applications to have independent contextual information associated with the target or dependent random variables of the problem. We show how to incorporate such … Read more

Distributionally Robust Optimization with Decision-Dependent Polyhedral Ambiguity

We consider a two-stage stochastic program with continuous recourse, where the distribution of the random parameters depends on the decisions. Assuming a finite sample space, we study a distributionally robust approach to this problem, where the decision-dependent distributional ambiguity is modeled with a polyhedral ambiguity set. We consider cases where the recourse function and the … Read more

Computational Methods for the Household Assignment Problem

We consider the problem of assigning the entries of a household data set to real-world address data. This household assignment problem occurs in the geo-referencing step of spatial microsimulation models. The resulting combinatorial optimization model is a maximum weight matching problem with additional side constraints. Even for real-world instances of medium size, such as the … Read more

Granularity for mixed-integer polynomial optimization problems

Finding good feasible points is crucial in mixed-integer programming. For this purpose we combine a sufficient condition for consistency, called granularity, with the moment-/sos-hierarchy from polynomial optimization. If the mixed-integer problem is granular, we obtain feasible points by solving continuous polynomial problems and rounding their optimal points. The moment-/sos-hierarchy is hereby used to solve those … Read more