It is known that the gradient descent algorithm converges linearly when applied to a strongly convex function with Lipschitz gradient. In this case the algorithm's rate of convergence is determined by condition number of the function. In a similar vein, it has been shown that a variant of the Frank-Wolfe algorithm with away steps converges linearly when applied a strongly convex function over a polytope. In a nice extension of the unconstrained case, the algorithm's rate of convergence is determined by the product of the condition number of the function and a certain condition number of the polytope. We shed new light into the latter type of polytope conditioning. In particular, we show that previous and seemingly different approaches to define a suitable condition measure for the polytope are essentially equivalent to each other. Perhaps more interesting, they can all be unified via a parameter of the polytope that formalizes a key premise linked to the algorithm's linear convergence. We also give new insight into the linear convergence property. For a convex quadratic objective, we show that the rate of convergence is determined by the condition number of a suitably scaled polytope.
Article
View Polytope conditioning and linear convergence of the Frank-Wolfe algorithm