Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition

This work analyzes the solution trajectory of gradient-based algorithms via a novel basis function decomposition. We show that, although solution trajectories of gradient-based algorithms may vary depending on the learning task, they behave almost monotonically when projected onto an appropriate orthonormal function basis. Such projection gives rise to a basis function decomposition of the solution … Read more

Sparse PCA With Multiple Components

Sparse Principal Component Analysis (sPCA) is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. This involves solving a sparsity and orthogonality-constrained convex maximization problem, which is extremely computationally challenging. Most existing works address sparse PCA via methods—such as iteratively computing … Read more

Fixed-Point Automatic Differentiation of Forward–Backward Splitting Algorithms for Partly Smooth Functions

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We consider such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity … Read more

Optimized convergence of stochastic gradient descent by weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the … Read more

A Simplified Convergence Theory for Byzantine Resilient Stochastic Gradient Descent

In distributed learning, a central server trains a model according to updates provided by nodes holding local data samples. In the presence of one or more malicious servers sending incorrect information (a Byzantine adversary), standard algorithms for model training such as stochastic gradient descent (SGD) fail to converge. In this paper, we present a simplified … Read more

Stochastic nested primal-dual method for nonconvex constrained composition optimization

In this paper we study the nonconvex constrained composition optimization, in which the objective contains a composition of two expected-value functions whose accurate information is normally expensive to calculate. We propose a STochastic nEsted Primal-dual (STEP) method for such problems. In each iteration, with an auxiliary variable introduced to track the inner layer function values … Read more

Computing Tchebychev weight space decomposition for multiobjective discrete optimization problems

Multiobjective discrete optimization (MODO) techniques, including weight space decomposition, have received increasing attention in the last decade. The primary weight space decomposition technique in the literature is defined for the weighted sum utility function, through which sets of weights are assigned to a subset of the nondominated set. Recent work has begun to study the … Read more