We consider minimization of composite functions of the form $f(g(x))+h(x)$, where $f$ and $h$ are convex functions (which can be nonsmooth) and $g$ is a smooth vector mapping. In addition, we assume that $g$ is the average of finite number of component mappings or the expectation over a family of random component mappings. We propose a class of stochastic variance-reduced prox-linear algorithms for solving such problems and bound their sample complexities for finding an $\epsilon$-stationary point in terms of the total number of evaluations of the component mappings and their Jacobians. When $g$ is a finite average of $N$ components, we obtain sample complexity $O(N+ N^{4/5}\epsilon^{-1})$ for both mapping and Jacobian evaluations. When $g$ is a general expectation, we obtain sample complexities of $O(\epsilon^{-5/2})$ and $O(\epsilon^{-3/2})$ for component mappings and their Jacobians respectively. If in addition $f$ is smooth, then improved sample complexities of $O(N+N^{1/2}\epsilon^{-1})$ and $O(\epsilon^{-3/2})$ are derived for $g$ being a finite average and a general expectation respectively, for both component mapping and Jacobian evaluations.
Citation
Microsoft Research Technical Report: MSR-TR-2020-11