New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes

In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of … Read more

Subsampled cubic regularization method with distinct sample sizes for function, gradient, and Hessian

We develop and study a subsampled cubic regularization method for finite-sum composite optimization problems, in which the function, gradient, and Hessian are estimated using possibly different sample sizes. By allowing each quantity to have its own sampling strategy, the proposed method offers greater flexibility to control the accuracy of the model components and to better … Read more

A spatial branch-and-price-and-cut algorithm for finding globally optimal solutions to the continuous network design problem

Transportation network design, or the problem of optimizing infrastructure for a societal goal, subject to individual travelers optimizing their behavior for their own preferences arises frequently in many contexts. However, it is also an NP-hard problem due to the leader-follower or bi-level structure involving a follower objective that is different from yet significantly affects the … Read more

A Taxonomy of Multi-Objective Alignment Techniques for Large Language Models

Aligning large language models (LLMs) with human preferences has evolved from single-objective reward maximization to sophisticated multi-objective optimization. Real-world deployment requires balancing competing objectiveshelpfulness, harmlessness, honesty, instruction-following, and task-specic capabilitiesthat often conict. This survey provides a systematic taxonomy of multi-objective alignment techniques, organizing the rapidly growing literature into four categories: (1) Reward Decomposition approaches that … Read more

Artificial Intelligence in Supply Chain Optimization: A Systematic Review of Machine Learning Models, Methods, and Applications

Modern supply chains face mounting uncertainty and scale, motivating the integration of Artificial Intelligence (AI) and Machine Learning (ML) with mathematical optimization to enable robust and adaptive decisions. We present a systematic review of 199 articles on tangible supply chains, categorizing how ML is used—primarily for parameter estimation and for solution generation—and proposing a taxonomy … Read more

Constraint Decomposition for Multi-Objective Instruction-Following in Large Language Models

Large language models (LLMs) trained with reinforcement learning from human feed- back (RLHF) struggle with complex instructions that bundle multiple, potentially con- icting requirements. We introduce constraint decomposition, a framework that separates multi-objective instructions into orthogonal componentssemantic correctness, structural organization, format specications, and meta-level requirementsand optimizes each in- dependently before hierarchical combination. Our approach addresses … Read more

Data-Dependent Complexity of First-Order Methods for Binary Classification

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification … Read more

A Dual Riemannian ADMM Algorithm for Low-Rank SDPs with Unit Diagonal

This paper proposes a dual Riemannian alternating direction method of multipliers (ADMM) for solving low-rank semidefinite programs with unit diagonal constraints. We recast the ADMM subproblem as a Riemannian optimization problem over the oblique manifold by performing the Burer-Monteiro factorization. Global convergence of the algorithm is established assuming that the subproblem is solved to certain … Read more

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Semidefinite programming via Projective Cutting Planes for dense (easily-feasible) instances

The cone of positive semi-definite (SDP) matrices can be described by an infinite number of linear constraints. It is well-known that one can optimize over such a feasible area by standard Cutting Planes, but work on this idea remains a rare sight, likely due to its limited practical appeal compared to Interior Point Methods (IPMs). … Read more