We present a new complexity result on solving the Markov decision problem (MDP) with $n$ states and a number of actions for each state, a special class of real-number linear programs with the Leontief matrix structure. We prove that, when the discount factor $\theta$ is strictly less than $1$, the problem can be solved in at most $O(n^{1.5}(\log\frac{1}{1-\theta}+\log n))$ classical interior-point method iterations and $O(n^{4}(\log\frac{1}{1-\theta}+\log n))$ arithmetic operations. Our method is a {\em combinatorial} interior-point method related to the work of Ye \cite{Ye90} and Vavasis and Ye \cite{VavasisYe}. To our knowledge, this is the first {\em strongly polynomial-time} algorithm for solving the MDP with fixed discount factor or interest rate.
Citation
Working Paper, Department of Management Science and Engineering, Stanford University, Stanford, CA 94305; September 2002 (revised October 2004).