## Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $\cS$ and a finite action space $\cA$. We show that any randomized algorithm needs a running time at least $\Omega(\carS^2\carA)$ to compute an $\epsilon$-optimal policy with high probability. We consider two variants of the MDP where the … Read more