Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization

Distributionally robust optimization (DRO) has emerged as a powerful paradigm for reliable decision-making under uncertainty. This paper focuses on DRO with ambiguity sets defined via the Sinkhorn discrepancy: an entropy-regularized Wasserstein distance, referred to as Sinkhorn DRO. Existing work primarily addresses Sinkhorn DRO from a dual perspective, leveraging its formulation as a conditional stochastic optimization … Read more

An Elementary Proof of the Near Optimality of LogSumExp Smoothing

We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical smoothing, differing from the max function in value by at most $\ln(d)$. We provide an elementary construction of a lower bound, establishing that every overestimating smoothing of the max function must … Read more

A Taxonomy of Multi-Objective Alignment Techniques for Large Language Models

Aligning large language models (LLMs) with human preferences has evolved from single-objective reward maximization to sophisticated multi-objective optimization. Real-world deployment requires balancing competing objectiveshelpfulness, harmlessness, honesty, instruction-following, and task-specic capabilitiesthat often conict. This survey provides a systematic taxonomy of multi-objective alignment techniques, organizing the rapidly growing literature into four categories: (1) Reward Decomposition approaches that … Read more

Data-Dependent Complexity of First-Order Methods for Binary Classification

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification … Read more

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Preconditioned subgradient method for composite optimization: overparameterization and fast convergence

Composite optimization problems involve minimizing the composition of a smooth map with a convex function. Such objectives arise in numerous data science and signal processing applications, including phase retrieval, blind deconvolution, and collaborative filtering. The subgradient method achieves local linear convergence when the composite loss is well-conditioned. However, if the smooth map is, in a … Read more

Machine Learning Algorithms for Assisting Solvers for Constraint Satisfaction Problems

This survey proposes a unifying conceptual framework and taxonomy that systematically integrates Machine Learning (ML) and Reinforcement Learning (RL) with classical paradigms for Constraint Satisfaction and Boolean Satisfiability solving. Unlike prior reviews that focus on individual applications, we organize the literature around solver architecture, linking each major phase—constraint propagation, heuristic decision-making, conflict analysis, and meta-level … Read more

Machine Learning Algorithms for Assisting Solvers for Decision Optimization Problems

Combinatorial decision problems lie at the intersection of Operations Research (OR) and Artificial Intelligence (AI), encompassing structured optimization tasks such as submodular selection, dynamic programming, planning, and scheduling. These problems exhibit exponential growth in decision complexity, driven by interdependent choices coupled through logical, temporal, and resource constraints.  Classical optimization frameworks—including integer programming, submodular optimization, and … Read more

Closing the Gap: Efficient Algorithms for Discrete Wasserstein Barycenters

The Wasserstein barycenter problem seeks a probability measure that minimizes the weighted average of the Wasserstein distances to a given collection of probability measures. We study the discrete setting, where each measure has finite support — a regime that frequently arises in machine learning and operations research. The discrete Wasserstein barycenter problem is known to … Read more

Optimizing pricing strategies through learning the market structure

This study explores the integration of market structure learning into pricing strategies to maximize revenue in e-commerce and retail environments. We consider the problem of determining the revenue maximizing price of a single product in a market of heterogeneous consumers segmented by their product valuations; and analyze the pricing strategies for varying levels of prior … Read more