Facial reduction for nice (and non-nice) convex programs

Consider the primal problem of minimizing the sum of two closed proper convex functions \(f\) and \(g\). If the relative interiors of the domains of \(f\) and \(g\) intersect, then the primal problem and its corresponding Fenchel dual satisfy strong duality. When these relative interiors fail to intersect, pathologies and numerical difficulties may occur. In … Read more

Stability analysis of parameterized models relative to nonconvex constraints

For solution mappings of parameterized models (such as optimization problems, variational inequalities, and generalized equations), standard stability inevitably fails as the parameter approaches the boundary of the feasible domain. One remedy is relative stability restricted to a constraint set (e.g., the feasible domain), which is our focus in this paper. We establish generalized differentiation criteria … Read more

An Elementary Proof of the Near Optimality of LogSumExp Smoothing

We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical smoothing, differing from the max function in value by at most $\ln(d)$. We provide an elementary construction of a lower bound, establishing that every overestimating smoothing of the max function must … Read more

Robust optimality for nonsmooth mathematical programs with equilibrium constraints under data uncertainty

We develop a unified framework for robust nonsmooth optimization problems with equilibrium constraints (UNMPEC). As a foundation, we study a robust nonsmooth nonlinear program with uncertainty in both the objective function and the inequality constraints (UNP). Using Clarke subdifferentials, we establish Karush–Kuhn–Tucker (KKT)–type necessary optimality conditions under an extended no–nonzero–abnormal–multiplier constraint qualification (ENNAMCQ). When the … Read more

New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes

In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of … Read more

Subsampled cubic regularization method with distinct sample sizes for function, gradient, and Hessian

We develop and study a subsampled cubic regularization method for finite-sum composite optimization problems, in which the function, gradient, and Hessian are estimated using possibly different sample sizes. By allowing each quantity to have its own sampling strategy, the proposed method offers greater flexibility to control the accuracy of the model components and to better … Read more

Data-Dependent Complexity of First-Order Methods for Binary Classification

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification … Read more

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing \(\ell_1\)‐regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an \(\ell_2\) penalty for robustness. … Read more

Subgame Perfect Methods in Nonsmooth Convex Optimization

This paper considers nonsmooth convex optimization with either a subgradient or proximal operator oracle. In both settings, we identify algorithms that achieve the recently introduced game-theoretic optimality notion for algorithms known as subgame perfection. Subgame perfect algorithms meet a more stringent requirement than just minimax optimality. Not only must they provide optimal uniform guarantees on … Read more