Data-Driven Ranges of Near-Optimal Actions for Finite Markov Decision Processes

Markov decision process (MDP) models have been used to obtain non-stationary optimal decision rules in various applications, such as treatment planning in medical decision making. However, in practice, decision makers may prefer other strategies that are not statistically different from the optimal decision rules. To benefit from the decision makers’ expertise and provide flexibility in implementing decision strategies, we introduce a new framework for identifying sets of near-optimal actions for finite MDP models. We present a simulation-based dynamic programming algorithm that can be executed using parallel computing and show that it converges to the optimal solutions exponentially fast under fairly mild conditions. The sets of near-optimal actions are modeled as nonparametric simultaneous confidence intervals on the difference between an approximately optimal action and the remaining alternatives. By analyzing the structure of the sets, we characterize their behavior with respect to the modeling data and identify when they can be ordered as a range. Lastly, we show the scalability of our approach by finding ranges of near-optimal antihypertensive treatment choices for 16.72 million adults in the US.

Citation

Institution Address: MGH Institute for Technology Assessment, Harvard Medical School, Boston, MA 02114 Month/Year: June/2021

Article

Download

View PDF