Data-Dependent Complexity of First-Order Methods for Binary Classification

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification … Read more

A Dual Riemannian ADMM Algorithm for Low-Rank SDPs with Unit Diagonal

This paper proposes a dual Riemannian alternating direction method of multipliers (ADMM) for solving low-rank semidefinite programs with unit diagonal constraints. We recast the ADMM subproblem as a Riemannian optimization problem over the oblique manifold by performing the Burer-Monteiro factorization. Global convergence of the algorithm is established assuming that the subproblem is solved to certain … Read more

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Semidefinite programming via Projective Cutting Planes for dense (easily-feasible) instances

The cone of positive semi-definite (SDP) matrices can be described by an infinite number of linear constraints. It is well-known that one can optimize over such a feasible area by standard Cutting Planes, but work on this idea remains a rare sight, likely due to its limited practical appeal compared to Interior Point Methods (IPMs). … Read more

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing \(\ell_1\)‐regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an \(\ell_2\) penalty for robustness. … Read more

Subgame Perfect Methods in Nonsmooth Convex Optimization

This paper considers nonsmooth convex optimization with either a subgradient or proximal operator oracle. In both settings, we identify algorithms that achieve the recently introduced game-theoretic optimality notion for algorithms known as subgame perfection. Subgame perfect algorithms meet a more stringent requirement than just minimax optimality. Not only must they provide optimal uniform guarantees on … Read more

An Optimal Solution is Not Enough: Alternative Solutions and Optimal Power Systems

Power systems modeling and planning has long leveraged mathematical programming for its ability to provide optimality and feasibility guarantees. One feature that has been recognized in the optimization literature since the 1970s is the existence and meaning of multiple exact optimal and near-optimal solutions, which we call alternative solutions. In power systems modeling, the use … Read more

Exact Decentralized Optimization via Explicit $\ell_1$ Consensus Penalties

Consensus optimization enables autonomous agents to solve joint tasks through peer-to-peer exchanges alone. Classical decentralized gradient descent is appealing for its minimal state but fails to achieve exact consensus with fixed stepsizes unless additional trackers or dual variables are introduced. We revisit penalty methods and introduce a decentralized two-layer framework that couples an outer penalty-continuation … Read more

Dimensionality Reduction in Bilevel Linear Programming

We consider bilevel programs that involve a leader, who first commits to a mixed-integer decision, and a follower, who observes this decision and then responds rationally by solving a linear program (LP). Standard approaches often reformulate these bilevel optimization problems as single-level mixed-integer programs by exploiting the follower’s LP optimality conditions. These reformulations introduce either … Read more

A Framework for Handling and Exploiting Symmetry in Benders’ Decomposition

Benders’ decomposition (BD) is a framework for solving optimization problems by removing some variables and modeling their contribution to the original problem via so-called Benders cuts. While many advanced optimization techniques can be applied in a BD framework, one central technique has not been applied systematically in BD: symmetry handling. The main reason for this … Read more