Spurious Local Minima Exist for Almost All Over-parameterized Neural Networks

A popular belief for explaining the efficiency in training deep neural networks is that over-paramenterized neural networks have nice landscape. However, it still remains unclear whether over-parameterized neural networks contain spurious local minima in general, since all current positive results cannot prove non-existence of bad local minima, and all current negative results have strong restrictions to the activation functions, data samples or network architecture. In this paper we answer this question with a surprisingly negative result. In particular, we prove that for almost all deep over-parameterized non-linear neural networks, spurious local minima exist for generic input data samples. Our result helps give a more exact characterization of the landscape of deep neural networks and corrects a long-believed misunderstanding in the past decades.

Article

Download

View PDF