An Adaptive Proximal ADMM for Nonconvex Linearly-Constrained Composite Programs

This paper develops an adaptive Proximal Alternating Direction Method of Multipliers (P-ADMM) for solving linearly-constrained, weakly convex, composite optimization problems. This method is adaptive to all problem parameters, including smoothness and weak convexity constants. It is assumed that the smooth component of the objective is weakly convex and possibly nonseparable, while the non-smooth component is … Read more

Black-box Optimization Algorithms for Regularized Least-squares Problems

We consider the problem of optimizing the sum of a smooth, nonconvex function for which derivatives are unavailable, and a convex, nonsmooth function with easy-to-evaluate proximal operator. Of particular focus is the case where the smooth part has a nonlinear least-squares structure. We adapt two existing approaches for derivative-free optimization of nonsmooth compositions of smooth … Read more

On the strength of Burer’s lifted convex relaxation to quadratic programming with ball constraints

We study quadratic programs with m ball constraints, and the strength of a lifted convex relaxation for it recently proposed by Burer (2024). Burer shows this relaxation is exact when m=2. For general m, Burer (2024) provides numerical evidence that this lifted relaxation is tighter than the Kronecker product based Reformulation Linearization Technique (RLT) inequalities … Read more

A combinatorial approach to Ramana’s exact dual for semidefinite programming

Thirty years ago, in a seminal paper Ramana derived an exact dual for Semidefinite Programming (SDP). Ramana’s dual has the following remarkable features: i) it assumes feasibility of the primal, but it does not make any regularity assumptions, such as strict feasibility ii) its optimal value is the same as the optimal value of the … Read more

Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

In this work, we instantiate a regularized form of the gradient clipping algorithm and prove that it can converge to the global minima of deep neural network loss functions provided that the net is of sufficient width. We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art … Read more

Concrete convergence rates for common fixed point problems under Karamata regularity

\(\) We introduce the notion of Karamata regular operators, which is a notion of regularity that is suitable for obtaining concrete convergence rates for common fixed point problems. This provides a broad framework that includes, but goes beyond, Hölderian error bounds and Hölder regular operators. By concrete, we mean that the rates we obtain are … Read more

Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM

\(\) We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying matrix. This problem consists of an important generalization of the Matrix Completion problem, a central problem in Statistics, Operations Research and Machine Learning, that … Read more

Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls

Empirical risk minimization often fails to provide robustness against adversarial attacks in test data, causing poor out-of-sample performance. Adversarially robust optimization (ARO) has thus emerged as the de facto standard for obtaining models that hedge against such attacks. However, while these models are robust against adversarial attacks, they tend to suffer severely from overfitting. To … Read more

Wasserstein Distributionally Robust Optimization with Heterogeneous Data Sources

We study decision problems under uncertainty, where the decision-maker has access to K data sources that carry biased information about the underlying risk factors. The biases are measured by the mismatch between the risk factor distribution and the K data-generating distributions with respect to an optimal transport (OT) distance. In this situation the decision-maker can … Read more

A Decomposition Algorithm for Distributionally Robust Chance-Constrained Programs with Polyhedral Ambiguity Set

In this paper, we study a distributionally robust optimization approach to chance-constrained stochastic programs to hedge against uncertainty in the distributions of the random parameters. We consider a general polyhedral ambiguity set under finite support and study Wasserstein ambiguity set, total variation distance ambiguity set, and moment-based ambiguity set as examples for our computations. We … Read more