The Adaptive Sampling Gradient Method: Optimizing Smooth Functions with an Inexact Oracle

Consider settings such as stochastic optimization where a smooth objective function $f$ is unknown but can be estimated with an \emph{inexact oracle} such as quasi-Monte Carlo (QMC) or numerical quadrature. The inexact oracle is assumed to yield function estimates having error that decays with increasing oracle effort. For solving such problems, we present the Adaptive Sampling Gradient Method (ASGM) in two flavors depending on whether the step size used within ASGM is constant or determined through a backtracking line search. ASGM’s salient feature is the adaptive manner in which it constructs gradient estimates (henceforth called \emph{gradient approximates}), by exerting just enough oracle effort at each iterate to keep the error in the gradient approximate within a constant factor of the norm of the gradient approximate. ASGM applies to both derivative-based and derivative-free contexts, and generates iterates that globally convergence to a first-order critical point. We also prove two sets of results on ASGM’s \emph{work complexity} with respect to the gradient norm: (i) when $f$ is non-convex, ASGM’s work complexity is arbitrarily close to $\mathcal{O}(\epsilon^{-2 – \frac{1}{\mu(\alpha)}})$, where $\mu(\alpha)$ is the error decay rate of the gradient approximate expressed in terms of the error decay rate $\alpha$ of the objective function approximate; (ii) when $f$ is strongly convex, ASGM’s work complexity is arbitrarily close to $\mathcal{O}(\epsilon^{- \frac{1}{\mu(\alpha)}})$. We compare these complexities to those obtained from methods that use traditional random sampling. We also illustrate the calculation of $\alpha$ and $\mu(\alpha)$ for common choices, e.g., QMC with finite-difference derivatives.

Article

Download