We propose a new multi-model partially observable Markov decision process (MPOMDP) model to address the issue of model ambiguity in partially observable Markov decision process. Here, model ambiguity is defined as the case where there are multiple credible optimization models with the same structure but different model parameters. The proposed MPOMDP model aims to learn the distribution of the true model from system outputs over time, and to find the single optimal policy that maximizes the expected sum of all future rewards in all possible models. We discuss important structural properties of the proposed MPOMDP model, which not only reveal the benefit of the MPOMDP model by accounting for model ambiguity, but also motivate solution methods for MPOMDP. We develop an exact solution method, and two approximation methods that are shown to converge asymptotically, and compare their performance in computational experiments. Lastly, we use a case study of prostate cancer active surveillance to demonstrate how the MPOMDP model can be applied to a real-world problem to improve medical decision-making by created policies that are robust to different parameters in the multiple plausible models.
View Multi-model Partially Observable Markov Decision Processes