A Stability Result for Linear Markov Decision Processes

In this paper, we propose a semi-metric for Markov processes that allows to bound optimal values of linear Markov Decision Processes (MDPs). Similar to existing notions of distance for general stochastic processes our distance is based on transportation metrics. Apart from the specialization to MDPs, our contribution is to make the distance problem specific, i.e., explicitly dependent on the data of the problem whose objective value we want to bound. As a result, we are able to consider problems with randomness in the constraints as well as in the objective function and therefore relax an assumption in the extant literature. We derive several properties of the proposed semi-metrics and demonstrate its use in a stylized numerical example.

Citation

unpublished technical report, Technical University of Munich

Article

Download

View PDF