Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization

The Frank-Wolfe method has become increasingly useful in statistical and machine learning applications, due to the structure-inducing properties of the iterates, and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of Empirical Risk Minimization — one of the fundamental optimization problems in statistical and … Read more

Accelerated Stochastic Peaceman-Rachford Method for Empirical Risk Minimization

This work is devoted to studying an Accelerated Stochastic Peaceman-Rachford Splitting Method (AS-PRSM) for solving a family of structural empirical risk minimization problems. The objective function to be optimized is the sum of a possibly nonsmooth convex function and a finite-sum of smooth convex component functions. The smooth subproblem in AS-PRSM is solved by a stochastic gradient method using variance reduction … Read more

A Distributed Quasi-Newton Algorithm for Empirical Risk Minimization with Nonsmooth Regularization

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving ERM problems with a nonsmooth regularization term. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or work only for specific regularizers. Our algorithm uses successive quadratic approximations, and we describe how to … Read more

DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization

Machine learning with big data often involves large optimization models. For distributed optimization over a cluster of machines, frequent communication and synchronization of all model parameters (optimization variables) can be very costly. A promising solution is to use parameter servers to store different subsets of the model parameters, and update them asynchronously at different machines … Read more

On the convergence of stochastic bi-level gradient methods

We analyze the convergence of stochastic gradient methods for bi-level optimization problems. We address two specific cases: first when the outer objective function can be expressed as a finite sum of independent terms, and next when both the outer and inner objective functions can be expressed as finite sums of independent terms. We assume Lipschitz … Read more

Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss

We consider distributed convex optimization problems originated from sample average approximation of stochastic optimization, or empirical risk minimization in machine learning. We assume that each machine in the distributed computing system has access to a local empirical loss function, constructed with i.i.d. data sampled from a common distribution. We propose a communication-efficient distributed algorithm to … Read more

Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization

We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convex-concave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An … Read more

An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization

We consider the problem of minimizing the sum of two convex functions: one is smooth and given by a gradient oracle, and the other is separable over blocks of coordinates and has a simple known structure over each block. We develop an accelerated randomized proximal coordinate gradient (APCG) method for minimizing such convex composite functions. … Read more