This work proposes a model-free reinforcement learning approach to learn a long-term fleet planning problem subjected to air-travel demand uncertainty. The aim is to develop a dynamic fleet policy that adapts over time by intermediate assessments of the states. A Deep Q-network is trained to estimate the optimal fleet decisions based on the airline and network conditions. An end-to-end learning set-up is developed, where an optimisation algorithm evaluates the fleet decisions by comparing the optimal fleet solution profit to the estimated fleet solution profit. The stochastic evolution of air-travel demand is sampled by an adaptation of the mean-reversion Ornstein-Uhlenbeck process, adjusting the air-travel demand growth at each route for general network-demand growth to capture network trends. A case study is demonstrated for three demand scenarios for a small airline operating on a domestic US airport network. It is proven that the Deep Q-network can improve the prediction values of the fleet decisions by considering the air-travel demand as input states. Secondly, the trained fleet policy is able to generate near-optimal fleet solutions and shows comparable results to a reference deterministic optimisation algorithm.
Part of the MSc thesis from Mathias de Koning, Delft University of Technology, March, 2020