An Improved Analysis of Stochastic Gradient Descent with Momentum

SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise manner. Despite of its empirical advantage over SGD, the role of momentum is still unclear in general since previous analyses on SGDM either provide worse convergence bounds … Read more

Decentralized Learning with Lazy and Approximate Dual Gradients

This paper develops algorithms for decentralized machine learning over a network, where data are distributed, computation is localized, and communication is restricted between neighbors. A line of recent research in this area focuses on improving both computation and communication complexities. The methods SSDA and MSDA \cite{scaman2017optimal} have optimal communication complexity when the objective is smooth … Read more

Acceleration of SVRG and Katyusha X by Inexact Preconditioning

Empirical risk minimization is an important class of optimization problems with many popular machine learning applications, and stochastic variance reduction methods are popular choices for solving them. Among these methods, SVRG and Katyusha X (a Nesterov accelerated SVRG) achieve fast convergence without substantial memory requirement. In this paper, we propose to accelerate these two algorithms … Read more

Acceleration of Primal-Dual Methods by Preconditioning and Simple Subproblem Procedures

Primal-Dual Hybrid Gradient (PDHG) and Alternating Direction Method of Multipliers (ADMM) are two widely-used first-order optimization methods. They reduce a difficult problem to simple subproblems, so they are easy to implement and have many applications. As first-order methods, however, they are sensitive to problem conditions and can struggle to reach the desired accuracy. To improve … Read more

An Envelope for Davis-Yin Splitting and Strict Saddle Point Avoidance

It is known that operator splitting methods based on Forward Backward Splitting (FBS), Douglas-Rachford Splitting (DRS), and Davis-Yin Splitting (DYS) decompose a difficult optimization problems into simpler subproblem under proper convexity and smoothness assumptions. In this paper, we identify an envelope (an objective function) whose gradient descent iteration under a variable metric coincides with DYS … Read more

Douglas-Rachford Splitting for Pathological Convex Optimization

Despite the vast literature on DRS, there has been very little work analyzing their behavior under pathologies. Most analyses assume a primal solution exists, a dual solution exists, and strong duality holds. When these assumptions are not met, i.e., under pathologies, the theory often breaks down and the empirical performance may degrade significantly. In this … Read more

A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs

In this paper, we present a method for identifying infeasible, unbounded, and pathological conic programs based on Douglas-Rachford splitting, or equivalently ADMM. When an optimization program is infeasible, unbounded, or pathological, the iterates of Douglas-Rachford splitting diverge.Somewhat surprisingly, such divergent iterates still provide useful information, which our method uses for identification. In addition, for strongly … Read more