Global Convergence in Deep Learning with Variable Splitting via the Kurdyka-{\L}ojasiewicz Property

Published: 2018/10/22, Updated: 2019/07/05

Convex and Nonsmooth Optimization, Data-Mining block-coordinate descent, deep learning, global convergence, kurdyka-lojasiewicz inequality Short URL: https://optimization-online.org/?p=15454

Deep learning has recently attracted a significant amount of attention due to its great empirical success. However, the effectiveness in training deep neural networks (DNNs) remains a mystery in the associated nonconvex optimizations. In this paper, we aim to provide some theoretical understanding on such optimization problems. In particular, the Kurdyka-{\L}ojasiewicz (KL) property is established for DNN training with variable splitting schemes, which leads to the global convergence of block coordinate descent (BCD) type algorithms to a critical point of objective functions under natural conditions of DNNs. Some existing BCD algorithms can be viewed as special cases in this framework.