A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing … Read more

Parallel Interval Continuous Global Optimization Algorithms

We theorically study, on a distributed memory architecture, the parallelization of Hansen’s algorithm for the continuous global optimization with inequality constraints, using interval arithmetic. We propose a parallel algorithm based on a dynamic redistribution of the working list among the processors. On the other hand, we exploit the reduction technique, developped by Hansen, for computing … Read more