The stochastic Frank-Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the guaranteed convergence rate for stochastic Frank-Wolfe compared to its deterministic counterpart. In this work, we present a new stochastic Frank-Wolfe method which closes this gap by introducing the notion of a “substitute” gradient” that is a not-necessarily unbiased sample of the gradient. Moreover, we show that this new approach is equivalent to a randomized coordinate mirror descent algorithm applied to the dual problem, which in turn provides a new interpretation of dual coordinate descent method in the primal space. When the regularizer is furthermore strongly convex, we show that the generalized stochastic Frank-Wolfe method as well as the randomized dual coordinate descent present linear convergence. These new results are benefited from the understanding that first-order methods can inherently minimize the primal-dual gap.
Citation
MIT Operations Research Center Working Paper, MIT, July 2018.