Lipschitz Gradient Guarantees for Probability Functions and a New Algorithm for Probability Maximization

This work studies probability functions that appear in stochastic programming models. Although their differentiability has been widely investigated, the Lipschitz continuity of their gradients, crucial for the design and analysis of modern optimization algorithms, has received little attention. We develop a general framework that ensures differentiability and gradient Lipschitz continuity under practical conditions. Our framework … Read more

Distributionally Robust Optimization via Targeted Integral Probability Metrics for General Data Processes

Distributionally robust optimization (DRO) provides a principled framework for decision-making under distributional uncertainty. Existing classical data-driven DRO frameworks typically construct ambiguity sets from distributional information, such as moment constraints, divergence neighborhoods, or Wasserstein balls, specified before the downstream loss is considered. We propose a task-aware DRO framework based on targeted integral probability metrics. The ambiguity … Read more

Stochastic Gradient Methods with Online Scaling

This paper introduces Stochastic Online Scaled Gradient Methods (SOSGM), a generalization of the recently developed adaptive preconditioning framework in \cite{gao2025gradient,chu2025gradient} to stochastic optimization. Under standard assumptions, we establish convergence guarantees for SOSGM using large batchsize or variance reduction. SOSGM is compatible with popular diagonal and/or low-rank preconditioners as well as heavy-ball momentum, while maintaining memory … Read more

Optimality Gap of Tailored Base-Surge Policies Decays Exponentially in Regular-Source Lead Times for Dual-Sourcing Models

This paper resolves an open problem posed in the literature by proving that, in dual-sourcing inventory systems, the optimality gap of tailored base-surge (TBS) policies decays exponentially with the regular source lead time, with the express-source lead time fixed. In contrast to the existing approach, which relies on conditional Jensen inequalities and a vanishing-discount argument … Read more

Probabilistic analysis of dual decomposition on two-stage stochastic integer programs

Two-stage stochastic integer programs provide a powerful framework for modeling decision-making under uncertainty, but they are notoriously difficult to solve at scale due to their high dimensionality and intrinsic nonconvexity. Decomposition-based algorithms such as Benders methods and Branch-and-Price (related dual decomposition methods) have become standard computational approaches for such problems and demonstrate excellent empirical performance … Read more

Pseudo-Compact Formulations and Branch-and-Cut Approaches for the Capacitated Vehicle Routing Problem with Stochastic Demands

In this paper, we address the Capacitated Vehicle Routing Problem with Stochastic Demands (CVRPSD), in which routes are planned a priori and recourse actions are performed to ensure demand fulfillment. These recourse actions are defined through policies and may include replenishment trips or demand backlogging subject to penalties. We develop the first family of pseudo-compact … Read more

Multistage Conditional Compositional Optimization

We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, … Read more

A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muon

A unified framework for first-order optimization algorithms for nonconvex unconstrained optimization is proposed that uses adaptively preconditioned gradients and includes popular methods such as full and diagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo and Muon. This framework also allows combining heterogeneous geometries across different groups of variables while preserving a unified convergence … Read more

Complexity of an inexact stochastic SQP algorithm for equality constrained optimization

In this paper, we consider nonlinear optimization problems with a stochastic objective function and deterministic equality constraints. We propose an inexact two-stepsize stochastic sequential quadratic programming (SQP) algorithm and analyze its worst-case complexity under mild assumptions. The method utilizes a step decomposition strategy and handles stochastic gradient estimates by assigning different stepsizes to different components … Read more

Asymptotic Consistency of Data-Driven Distributionally Robust Optimization via Reference-Distribution Convergence and Ambiguity-Set Shrinkage

We study asymptotic consistency of data-driven distributionally robust optimization with shrinking ambiguity sets. The analysis separates reference-distribution convergence from ambiguity-set shrinkage on a prescribed test-function class. Under compactness and continuity assumptions, this yields uniform convergence of robust objectives, optimal-value convergence, and outer convergence of minimizers. For constrained DRO, the same mechanism gives uniform convergence of … Read more