Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach

Mathematical programming formulation to identify the optimal value function of a negative Markov decision process (MDP) is non-convex, non-smooth, and computationally intractable. Also note that other well-known solution methods of MDP do not work properly for a negative MDP. More specifically, the policy iteration diverges, and the value iteration converges but does not provide an error bound in a finite number of iterations. To overcome this challenge, we develop a mixed integer linear programming (MILP) formulation using the theory of disjunctive programming. Then, we discuss its strength and how to extract the optimal value function and an optimal policy from this MILP formulation. Moreover, we exemplify our formulation for a class of stopping problems studied by Ross (1983). At the end, we briefly present another formulation developed by the traditional big-M method.

Citation

Dehghanian, A., "Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach", working paper, 2019.

Article

Download