On a Frank-Wolfe Approach for Abs-smooth Functions

We propose an algorithm which appears to be the first bridge between the fields of conditional gradient
methods and abs-smooth optimization. Our nonsmooth nonconvex problem setting is motivated by machine
learning, since the broad class of abs-smooth functions includes, for instance, the squared $l_2$-error
of a neural network with ReLU or hinge loss activation. To overcome the nonsmoothness in our problem, we
propose a generalization to the traditional Frank-Wolfe gap and prove that first-order minimality is
achieved when it vanishes. We derive a convergence rate for our algorithm which is identical to the
smooth case. Although our algorithm necessitates the solution of a subproblem which is more challenging
than the smooth case, we provide an efficient numerical method for its partial solution, and we identify
several applications where our approach fully solves the subproblem. Numerical and theoretical convergence
is demonstrated, yielding several conjectures.



View On a Frank-Wolfe Approach for Abs-smooth Functions