This paper presents a practical architecture for last-mile delivery routing at scales reaching one million stops under realistic operational conditions, including vehicle capacity, package volume, route stop limits, and time windows. Unlike conventional systems that require pre-partitioning or large-scale infrastructure, the proposed system addresses the full fleet planning problem through a single coherent planning pipeline on commodity hardware. Physical feasibility constraints (vehicle capacity and full stop coverage) are strictly enforced; soft operational constraints (time windows, route stop limits, route duration) are subject to measurable, bounded violation rates, in line with standard last-mile practice where perfect compliance is not physically achievable under realistic travel-time variability.
The system combines parallel constraint-aware clustering, constraint-aware initial allocation, distributed neighbor-based rebalancing, and fast route-level optimization to produce fleet plans that preserve global coherence without a centralized monolithic planner. Evaluated against the public Amazon Last Mile Routing Research Challenge dataset, and under a shared external measurement protocol based on OSRM and Google Maps, the architecture reduces aggregate measured route distance by 23.3% and route count by 11.1% relative to the released Amazon baseline routes, with a mean depot-level distance reduction of 17.59%. The comparison is conducted on the same set of stops, under the same measurement procedure, and reflects differences in routing structure under an external metric; it does not claim equivalence with Amazon’s internal operational objective, which is not publicly specified.
In an extended scaling experiment, the same system processes one million stops in approximately 20 minutes on a commodity laptop, exhibiting near-linear empirical runtime growth. The contribution is architectural rather than theoretical: large-scale routing becomes tractable when computation is organized into bounded, composable stages, enabling efficient planning without input-size caps or specialized infrastructure.