stochastic primal-dual methods – Optimization Online

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

Published: 2016/12/07

Yichen Chen
Mengdi Wang

Dynamic Programming reinforcement learning, stochastic primal-dual methods

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods … Read more