Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

We propose a randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality, the algorithm adaptively samples state transitions and makes exponentiated primal-dual updates. We show that it finds an ε-optimal policy using nearly-linear running time in the worst case. For Markov decision processes that … Read more