A Geometric Perspective on Polynomially Solvable Convex Maximization

Convex maximization arises in many applications but is generally NP-hard, even for low-rank objectives. This paper introduces a set of broadly applicable conditions that certify when such problems are polynomially solvable. Our main condition is a new property of the feasible set, which we term co-monotonicity. We show that this property holds for two important … Read more

An Inexact Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

We introduce method iR2N, a modified proximal quasi-Newton method for minimizing the sum of a \(C^1\) function \(f\) and a lower semi-continuous prox-bounded \(h\) that permits inexact evaluations of \(f\), \(\nabla f\) and of the relevant proximal operators. Both \(f\) and \(h\) may be nonconvex. In applications where the proximal operator of \(h\) is not … Read more

Primal-dual resampling for solution validation in convex stochastic programming

Suppose we wish to determine the quality of a candidate solution to a convex stochastic program in which the objective function is a statistical functional parameterized by the decision variable and known deterministic constraints may be present. Inspired by stopping criteria in primal-dual and interior-point methods, we develop cancellation theorems that characterize the convergence of … Read more

Fast and Simple Multiclass Data Segmentation: An Eigendecomposition and Projection-Free Approach

Graph-based machine learning has seen an increased interest over the last decade with many connections to other fields of applied mathematics. Learning based on partial differential equations, such as the phase-field Allen-Cahn equation, allows efficient handling of semi-supervised learning approaches on graphs. The numerical solution of the graph Allen-Cahn equation via a convexity splitting or … Read more

Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization

Distributionally robust optimization (DRO) has emerged as a powerful paradigm for reliable decision-making under uncertainty. This paper focuses on DRO with ambiguity sets defined via the Sinkhorn discrepancy: an entropy-regularized Wasserstein distance, referred to as Sinkhorn DRO. Existing work primarily addresses Sinkhorn DRO from a dual perspective, leveraging its formulation as a conditional stochastic optimization … Read more

An Elementary Proof of the Near Optimality of LogSumExp Smoothing

We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical smoothing, differing from the max function in value by at most $\ln(d)$. We provide an elementary construction of a lower bound, establishing that every overestimating smoothing of the max function must … Read more

A Taxonomy of Multi-Objective Alignment Techniques for Large Language Models

Aligning large language models (LLMs) with human preferences has evolved from single-objective reward maximization to sophisticated multi-objective optimization. Real-world deployment requires balancing competing objectiveshelpfulness, harmlessness, honesty, instruction-following, and task-specic capabilitiesthat often conict. This survey provides a systematic taxonomy of multi-objective alignment techniques, organizing the rapidly growing literature into four categories: (1) Reward Decomposition approaches that … Read more

Data-Dependent Complexity of First-Order Methods for Binary Classification

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification … Read more

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Preconditioned subgradient method for composite optimization: overparameterization and fast convergence

Composite optimization problems involve minimizing the composition of a smooth map with a convex function. Such objectives arise in numerous data science and signal processing applications, including phase retrieval, blind deconvolution, and collaborative filtering. The subgradient method achieves local linear convergence when the composite loss is well-conditioned. However, if the smooth map is, in a … Read more