In this paper, we study the loss surface of the over-parameterized fully connected deep neural networks. We prove that for any continuous activation functions, the loss function has no bad strict local minimum, both in the regular sense and in the sense of sets. This result holds for any convex and differentiable loss function, and the data samples are only required to be distinct in at least one dimension. Furthermore, we show that bad local minima do exist for a class of activation functions, so without further assumptions it is impossible to prove every local minimum is a global minimum.
View Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations