Data-Dependent Complexity of First-Order Methods for Binary Classification

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification … Read more

A Dual Riemannian ADMM Algorithm for Low-Rank SDPs with Unit Diagonal

This paper proposes a dual Riemannian alternating direction method of multipliers (ADMM) for solving low-rank semidefinite programs with unit diagonal constraints. We recast the ADMM subproblem as a Riemannian optimization problem over the oblique manifold by performing the Burer-Monteiro factorization. Global convergence of the algorithm is established assuming that the subproblem is solved to certain … Read more

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Semidefinite programming via Projective Cutting Planes for dense (easily-feasible) instances

The cone of positive semi-definite (SDP) matrices can be described by an infinite number of linear constraints. It is well-known that one can optimize over such a feasible area by standard Cutting Planes, but work on this idea remains a rare sight, likely due to its limited practical appeal compared to Interior Point Methods (IPMs). … Read more

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing \(\ell_1\)‐regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an \(\ell_2\) penalty for robustness. … Read more

Subgame Perfect Methods in Nonsmooth Convex Optimization

This paper considers nonsmooth convex optimization with either a subgradient or proximal operator oracle. In both settings, we identify algorithms that achieve the recently introduced game-theoretic optimality notion for algorithms known as subgame perfection. Subgame perfect algorithms meet a more stringent requirement than just minimax optimality. Not only must they provide optimal uniform guarantees on … Read more

An Optimal Solution is Not Enough: Alternative Solutions and Optimal Power Systems

Power systems modeling and planning has long leveraged mathematical programming for its ability to provide optimality and feasibility guarantees. One feature that has been recognized in the optimization literature since the 1970s is the existence and meaning of multiple exact optimal and near-optimal solutions, which we call alternative solutions. In power systems modeling, the use … Read more

Exact Decentralized Optimization via Explicit $\ell_1$ Consensus Penalties

Consensus optimization enables autonomous agents to solve joint tasks through peer-to-peer exchanges alone. Classical decentralized gradient descent is appealing for its minimal state but fails to achieve exact consensus with fixed stepsizes unless additional trackers or dual variables are introduced. We revisit penalty methods and introduce a decentralized two-layer framework that couples an outer penalty-continuation … Read more