Decomposition Methods for Global Solutions of Mixed-Integer Linear Programs

This paper introduces two decomposition-based methods for two-block mixed-integer linear programs (MILPs), which break the original problem into a sequence of smaller MILP subproblems. The first method is based on the l1-augmented Lagrangian. The second method is based on the alternating direction method of multipliers. When the original problem has a block-angular structure, the subproblems … Read more

An Improved Analysis of Stochastic Gradient Descent with Momentum

SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise manner. Despite of its empirical advantage over SGD, the role of momentum is still unclear in general since previous analyses on SGDM either provide worse convergence bounds … Read more

Decentralized Learning with Lazy and Approximate Dual Gradients

This paper develops algorithms for decentralized machine learning over a network, where data are distributed, computation is localized, and communication is restricted between neighbors. A line of recent research in this area focuses on improving both computation and communication complexities. The methods SSDA and MSDA \cite{scaman2017optimal} have optimal communication complexity when the objective is smooth … Read more

Acceleration of SVRG and Katyusha X by Inexact Preconditioning

Empirical risk minimization is an important class of optimization problems with many popular machine learning applications, and stochastic variance reduction methods are popular choices for solving them. Among these methods, SVRG and Katyusha X (a Nesterov accelerated SVRG) achieve fast convergence without substantial memory requirement. In this paper, we propose to accelerate these two algorithms … Read more

Acceleration of Primal-Dual Methods by Preconditioning and Simple Subproblem Procedures

Primal-Dual Hybrid Gradient (PDHG) and Alternating Direction Method of Multipliers (ADMM) are two widely-used first-order optimization methods. They reduce a difficult problem to simple subproblems, so they are easy to implement and have many applications. As first-order methods, however, they are sensitive to problem conditions and can struggle to reach the desired accuracy. To improve … Read more

An Envelope for Davis-Yin Splitting and Strict Saddle Point Avoidance

It is known that operator splitting methods based on Forward Backward Splitting (FBS), Douglas-Rachford Splitting (DRS), and Davis-Yin Splitting (DYS) decompose a difficult optimization problems into simpler subproblem under proper convexity and smoothness assumptions. In this paper, we identify an envelope (an objective function) whose gradient descent iteration under a variable metric coincides with DYS … Read more

Douglas-Rachford Splitting for Pathological Convex Optimization

Despite the vast literature on DRS, there has been very little work analyzing their behavior under pathologies. Most analyses assume a primal solution exists, a dual solution exists, and strong duality holds. When these assumptions are not met, i.e., under pathologies, the theory often breaks down and the empirical performance may degrade significantly. In this … Read more

Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers

Many optimization algorithms converge to stationary points. When the underlying problem is nonconvex, they may get trapped at local minimizers and occasionally stagnate near saddle points. We propose the Run-and-Inspect Method, which adds an “inspect” phase to existing algorithms that helps escape from non-global stationary points. The inspection samples a set of points in a … Read more

Proximal-Proximal-Gradient Method

In this paper, we present the proximal-proximal-gradient method (PPG), a novel optimization method that is simple to implement and simple to parallelize. PPG generalizes the proximal-gradient method and ADMM and is applicable to minimization problems written as a sum of many differentiable and many non-differentiable convex functions. The non-differentiable functions can be coupled. We furthermore … Read more

A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs

In this paper, we present a method for identifying infeasible, unbounded, and pathological conic programs based on Douglas-Rachford splitting, or equivalently ADMM. When an optimization program is infeasible, unbounded, or pathological, the iterates of Douglas-Rachford splitting diverge.Somewhat surprisingly, such divergent iterates still provide useful information, which our method uses for identification. In addition, for strongly … Read more