Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

Published: 2017/04/05, Updated: 2017/09/13

Mengdi Wang

Dynamic Programming, Linear Programming duality, linear programming, markov decision processes, primal-dual algorithms, randomized algorithm, running-time complexity, stochastic approximation Short URL: https://optimization-online.org/?p=14501

We propose a randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality, the algorithm adaptively samples state transitions and makes exponentiated primal-dual updates. We show that it finds an ε-optimal policy using nearly-linear running time in the worst case. For Markov decision processes that are ergodic under every stationary policy, we show that the algorithm finds an ε-optimal policy using running time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide new complexity benchmarks for solving stochastic dynamic programs.

Article

Download

View PDF