Finite-Sample Optimality and Constraint Satisfaction: Learning-Based Optimal Control in Dynamic Dispatch Networks

\(\) Dynamic dispatch networks in logistics and transportation require real-time, constraint-aware decision-making under stochastic demand. This paper bridges mathematical optimization, optimal control theory, and reinforcement learning by establishing non-asymptotic theoretical guarantees for learning-based optimal control in constrained stochastic dispatch systems. We formulate the problem as a constrained Markov decision process, enforce feasibility via a projection-based policy architecture, and derive finite-sample convergence rates, explicit constraint violation bounds, and Input-to-State Stability (ISS) certificates. Our analysis proves an \(O(1/\sqrt{K})\) optimality gap decay with high-probability feasibility guarantees, matching minimax lower bounds for non-convex stochastic policy optimization. Numerical experiments across ride-hailing, last-mile delivery, and line-haul freight environments validate the theoretical predictions, demonstrating sample-efficient convergence, robust constraint adherence, and operational KPI improvements without environment-specific tuning.

Article

Download

View PDF