An Elementary Proof of the Near Optimality of LogSumExp Smoothing

We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical smoothing, differing from the max function in value by at most $\ln(d)$. We provide an elementary construction of a lower bound, establishing that every overestimating smoothing of the max function must … Read more

Robust optimality for nonsmooth mathematical programs with equilibrium constraints under data uncertainty

We develop a unified framework for robust nonsmooth optimization problems with equilibrium constraints (UNMPEC). As a foundation, we study a robust nonsmooth nonlinear program with uncertainty in both the objective function and the inequality constraints (UNP). Using Clarke subdifferentials, we establish Karush–Kuhn–Tucker (KKT)–type necessary optimality conditions under an extended no–nonzero–abnormal–multiplier constraint qualification (ENNAMCQ). When the … Read more

New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes

In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of … Read more

Subsampled cubic regularization method with distinct sample sizes for function, gradient, and Hessian

We develop and study a subsampled cubic regularization method for finite-sum composite optimization problems, in which the function, gradient, and Hessian are estimated using possibly different sample sizes. By allowing each quantity to have its own sampling strategy, the proposed method offers greater flexibility to control the accuracy of the model components and to better … Read more

Data-Dependent Complexity of First-Order Methods for Binary Classification

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification … Read more

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing \(\ell_1\)‐regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an \(\ell_2\) penalty for robustness. … Read more

Subgame Perfect Methods in Nonsmooth Convex Optimization

This paper considers nonsmooth convex optimization with either a subgradient or proximal operator oracle. In both settings, we identify algorithms that achieve the recently introduced game-theoretic optimality notion for algorithms known as subgame perfection. Subgame perfect algorithms meet a more stringent requirement than just minimax optimality. Not only must they provide optimal uniform guarantees on … Read more

On the Convergence of Constrained Gradient Method

The constrained gradient method (CGM) has recently been proposed to solve convex optimization and monotone variational inequality (VI) problems with general functional constraints. While existing literature has established convergence results for CGM, the assumptions employed therein are quite restrictive; in some cases, certain assumptions are mutually inconsistent, leading to gaps in the underlying analysis. This … Read more

Supermodularity, Curvature, and Convex Relaxations in a Class of Quadratic Binary Optimization Problems

We study the combinatorial structure of a quadratic set function $F(S)$ arising from a class of binary optimization models within the family of undesirable facility location problems. Despite strong empirical evidence of nested optimal solutions in previously studied real-world instances, we show that $F(S)$ is, in general, neither submodular nor supermodular. To quantify deviation from … Read more