A gradient type algorithm with backward inertial steps for a nonconvex minimization

We investigate an algorithm of gradient type with a backward inertial step in connection with the minimization of a nonconvex differentiable function. We show that the generated sequences converge to a critical point of the objective function, if a regularization of the objective function satis es the Kurdyka-Lojasiewicz property. Further, we provide convergence rates for the … Read more

Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations

In this paper, we study the loss surface of the over-parameterized fully connected deep neural networks. We prove that for any continuous activation functions, the loss function has no bad strict local minimum, both in the regular sense and in the sense of sets. This result holds for any convex and differentiable loss function, and … Read more

Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints

We provide sharp worst-case evaluation complexity bounds for nonconvex minimization problems with general inexpensive constraints, i.e.\ problems where the cost of evaluating/enforcing of the (possibly nonconvex or even disconnected) constraints, if any, is negligible compared to that of evaluating the objective function. These bounds unify, extend or improve all known upper and lower complexity bounds … Read more

A Subsampling Line-Search Method with Second-Order Results

In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic algorithms that sample problem data, which can jeopardize the guarantees obtained through classical globalization techniques in optimization such as a trust … Read more

Global Solutions of Nonconvex Standard Quadratic Programs via Mixed Integer Linear Programming Reformulations

A standard quadratic program is an optimization problem that consists of minimizing a (nonconvex) quadratic form over the unit simplex. We focus on reformulating a standard quadratic program as a mixed integer linear programming problem. We propose two alternative mixed integer linear programming formulations. Our first formulation is based on casting a standard quadratic program … Read more

Second-order Guarantees of Distributed gradient Algorithms

We consider distributed smooth nonconvex unconstrained optimization over networks, modeled as a connected graph. We examine the behavior of distributed gradient-based algorithms near strict saddle points. Specifically, we establish that (i) the renowned Distributed Gradient Descent (DGD) algorithm likely converges to a neighborhood of a Second-order Stationary (SoS) solution; and (ii) the more recent class … Read more

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose … Read more

Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization

We investigate an inertial algorithm of gradient type in connection with the minimization of a nonconvex differentiable function. The algorithm is formulated in the spirit of Nesterov’s accelerated convex gradient method. We show that the generated sequences converge to a critical point of the objective function, if a regularization of the objective function satis es the … Read more

Efficient Optimization Algorithms for Robust Principal Component Analysis and Its Variants

Robust PCA has drawn significant attention in the last decade due to its success in numerous application domains, ranging from bio-informatics, statistics, and machine learning to image and video processing in computer vision. Robust PCA and its variants such as sparse PCA and stable PCA can be formulated as optimization problems with exploitable special structures. … Read more

Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization

Recent years have witnessed a surge of interest in parallel and distributed optimization methods for large-scale systems. In particular, nonconvex large-scale optimization problems have found a wide range of applications in several engineering fields. The design and the analysis of such complex, large-scale, systems pose several challenges and call for the development of new optimization … Read more