Analysis of Monotone Policy Iteration for Interpretable Policies in Markov Decision Processes: Impact of State Ordering Rules

Optimizing interpretable policies for Markov Decision Processes (MDPs) can be computationally intractable for large-scale MDPs, e.g., for monotone policies, the optimal interpretable policy depends on the initial state distribution, precluding standard dynamic programming techniques. Previous work has proposed Monotone Policy Iteration (MPI) to produce a feasible solution for warm starting a Mixed Integer Linear Program … Read more