A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Two-Stage Data-Driven Contextual Robust Optimization: An End-to-End Learning Approach for Online Energy Applications

Traditional end-to-end contextual robust optimization models are trained for specific contextual data, requiring complete retraining whenever new contextual information arrives. This limitation hampers their use in online decision-making problems such as energy scheduling, where multiperiod optimization must be solved every few minutes. In this paper, we propose a novel Data-Driven Contextual Uncertainty Set, which gives … Read more

A Sound Local Regret Methodology for Online Nonconvex Composite Optimization

Online nonconvex optimization addresses dynamic and complex decision-making problems arising in real-world decision-making tasks where the optimizer’s objective evolves with the intricate and changing nature of the underlying system. This paper studies an online nonconvex composite optimization model with limited first-order access, encompassing a wide range of practical scenarios. We define local regret using a … Read more

A Simple Algorithm for Online Decision Making

Motivated by recent progress on online linear programming (OLP), we study the online decision making problem (ODMP) as a natural generalization of OLP. In ODMP, there exists a single decision maker who makes a series of decisions spread out over a total of \(T\) time stages. At each time stage, the decision maker makes a … Read more

Compromise Policy for Multi-stage Stochastic Linear Programming: Variance and Bias Reduction

This paper focuses on algorithms for multi-stage stochastic linear programming (MSLP). We propose an ensemble method named the “compromise policy”, which not only reduces the variance of the function approximation but also reduces the bias of the estimated optimal value. It provides a tight lower bound estimate with a confidence interval. By exploiting parallel computing, … Read more

Modeling Multi-stage Decision Making under Incomplete and Uncertain Information

We propose a new universal framework for multi-stage decision making under limited information availability. It is developed as part of a larger research project which aims at providing analytical methods to compare and evaluate different models and algorithms for multi-stage decision making. In our setting, we have an open time horizon and limited information about … Read more

Decentralized Online Integer Programming Problems with a Coupling Cardinality Constraint

We consider a problem involving a set of agents who need to coordinate their actions to optimize the sum of their objectives while satisfying a common resource constraint. The objective functions of the agents are unknown to them a priori and are revealed in an online manner. The resulting problem is an online optimization problem … Read more

Dynamic Relaxations for Online Bipartite Matching

Online bipartite matching (OBM) is a fundamental model underpinning many important applications, including search engine advertisement, website banner and pop-up ads, and ride-hailing. We study the i.i.d. OBM problem, where one side of the bipartition is fixed and known in advance, while nodes from the other side appear sequentially as i.i.d. realizations of an underlying … Read more

Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as $\ell_1$-norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the … Read more