We consider the problem of decision-making in Markov decision processes (MDPs) when the reward or transition probability parameters are not known with certainty. We consider an approach in which the decision-maker (DM) considers multiple models of the parameters for an MDP and wishes to find a policy that optimizes an objective function that considers the performance with respect to each model, such as maximizing the expected performance or maximizing worst-case performance. Existing solution methods rely on mixed-integer programming (MIP) formulations, but have previously been limited to small instances due to the computational complexity. In this article, we present branch-and-cut (B&C) and a custom branch-and-bound (B&B) solution methods that leverage the decomposable structure of the problem and allow for the solution of MDPs that consider many models of the parameters. Numerical experiments show that a customized implementation of B&B significantly outperforms the MIP-based solution methods and that the best choice of objective function depends on the DM's attitude towards parameter ambiguity.
Citation
Technical Report. University of Michigan, August 2019.