Markov decision processes (MDPs) have found success in many application areas that involve sequential decision making under uncertainty, including the evaluation and design of treatment and screening protocols for medical decision making. However, the usefulness of these models is only as good as the data used to parameterize them, and multiple competing data sources are common in many application areas, including medicine. In this article, we introduce the Multi-model MDP (MMDP) which generalizes a standard MDP by allowing for multiple models of the rewards and transition probabilities. Solution of the MMDP generates a single policy that maximizes the weighted performance over all models. This approach allows for the decision maker to explicitly trade-off conflicting sources of data while generating a policy of the same level of complexity for models that only consider a single source of data. We study the structural properties of this problem and show that this problem is at least NP-hard. We develop exact methods and fast approximation methods supported by bounds on the error. Finally, we illustrate the effectiveness and the scalability of our approach using a case study in preventative blood pressure and cholesterol management that accounts for conflicting published cardiovascular risk models and multiple models of the natural history of the disease.
Steimle, L. N., Kaufman, D.L., and Denton B.T. Multi-model Markov Decision Processes. Optimization-online, Updated on August 10, 2019.