Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach

Mathematical programming formulation to identify the optimal value function of a negative Markov decision process (MDP) is non-convex, non-smooth, and computationally intractable. Also note that other well-known solution methods of MDP do not work properly for a negative MDP. More specifically, the policy iteration diverges, and the value iteration converges but does not provide an