Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity

It was recently found that the standard version of multi-block cyclic ADMM diverges. Interestingly, Gaussian Back Substitution ADMM (GBS-ADMM) and symmetric Gauss-Seidel ADMM (sGS-ADMM) do not have the divergence issue. Therefore, it seems that symmetrization can improve the performance of the classical cyclic order. In another recent work, cyclic CD (Coordinate Descent) was shown to … Read more

Spurious Local Minima Exist for Almost All Over-parameterized Neural Networks

A popular belief for explaining the efficiency in training deep neural networks is that over-paramenterized neural networks have nice landscape. However, it still remains unclear whether over-parameterized neural networks contain spurious local minima in general, since all current positive results cannot prove non-existence of bad local minima, and all current negative results have strong restrictions … Read more

Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations

In this paper, we study the loss surface of the over-parameterized fully connected deep neural networks. We prove that for any continuous activation functions, the loss function has no bad strict local minimum, both in the regular sense and in the sense of sets. This result holds for any convex and differentiable loss function, and … Read more