Optimal stopping is the problem of determining when to stop a stochastic system in order to maximize reward, which is of practical importance in domains such as finance, operations management and healthcare. Existing methods for high-dimensional optimal stopping that are popular in practice produce deterministic linear policies — policies that deterministically stop based on the sign of a weighted sum of basis functions — but are not guaranteed to find the optimal policy within this policy class given a fixed basis function architecture. In this paper, we propose a new methodology for optimal stopping based on randomized linear policies, which choose to stop with a probability that is determined by a weighted sum of basis functions. We motivate these policies in two ways: first, we establish that under mild conditions, given a fixed basis function architecture, optimizing over randomized linear policies is equivalent to optimizing over deterministic linear policies; and second, we theoretically construct simple parameterized families of problem instances where the popular least squares Monte Carlo approach can be made to perform arbitrarily poorly, whereas the optimal deterministic policy is either exactly optimal or arbitrarily close to optimal. We formulate the problem of learning randomized linear policies from data as a smooth non-convex sample average approximation (SAA) problem. We show that the SAA problem is in general NP-Hard, and consequently develop a practical heuristic for solving it based on alternating maximization. We numerically demonstrate the value of our method through a rich set of experiments, involving a standard Bermudan max-call option pricing benchmark, a more complex problem involving fractional Brownian motion, and a stylized exit-timing problem calibrated using real cryptocurrency data.
Citation
Working paper