A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation Constraints

Stochastic convex optimization problems with expectation constraints (SOECs) are encountered in statistics and machine learning, business, and engineering. In data-rich environments, the SOEC objective and constraints contain expectations defined with respect to large datasets. Therefore, efficient algorithms for solving such SOECs need to limit the fraction of data points that they use, which we refer … Read more

Distributionally Robust Optimization with Confidence Bands for Probability Density Functions

Distributionally robust optimization (DRO) has been introduced for solving stochastic programs where the distribution of the random parameters is unknown and must be estimated by samples from that distribution. A key element of DRO is the construction of the ambiguity set, which is a set of distributions that covers the true distribution with a high … Read more

A Level-set Method For Convex Optimization with a Feasible Solution Path

Large-scale constrained convex optimization problems arise in several application domains. First-order methods are good candidates to tackle such problems due to their low iteration complexity and memory requirement. The level-set framework extends the applicability of first-order methods to tackle problems with complicated convex objectives and constraint sets. Current methods based on this framework either rely … Read more

DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization

Machine learning with big data often involves large optimization models. For distributed optimization over a cluster of machines, frequent communication and synchronization of all model parameters (optimization variables) can be very costly. A promising solution is to use parameter servers to store different subsets of the model parameters, and update them asynchronously at different machines … Read more

Revisiting Approximate Linear Programming Using a Saddle Point Approach

Approximate linear programs (ALPs) are well-known models for computing value function approximations (VFAs) of intractable Markov decision processes (MDPs) arising in applications. VFAs from ALPs have desirable theoretical properties, define an operating policy, and provide a lower bound on the optimal policy cost, which can be used to assess the suboptimality of heuristic policies. However, … Read more

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/epsilon)

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving … Read more

RSG: Beating Subgradient Method without Smoothness and Strong Convexity

In this paper, we study the efficiency of a {\bf R}estarted {\bf S}ub{\bf G}radient (RSG) method that periodically restarts the standard subgradient method (SG). We show that, when applied to a broad class of convex optimization problems, RSG method can find an $\epsilon$-optimal solution with a low complexity than SG method. In particular, we first … Read more

Distributed Stochastic Variance Reduced Gradient Methods and a Lower Bound for Communication Complexity

We study distributed optimization algorithms for minimizing the average of convex functions. The applications include empirical risk minimization problems in statistical machine learning where the datasets are large and have to be stored on different machines. We design a distributed stochastic variance reduced gradient algorithm that, under certain conditions on the condition number, simultaneously achieves … Read more

An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization

We consider the problem of minimizing the sum of two convex functions: one is smooth and given by a gradient oracle, and the other is separable over blocks of coordinates and has a simple known structure over each block. We develop an accelerated randomized proximal coordinate gradient (APCG) method for minimizing such convex composite functions. … Read more

An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization

We consider optimization problems with an objective function that is the sum of two convex terms: one is smooth and given by a black-box oracle, and the other is general but with a simple, known structure. We first present an accelerated proximal gradient (APG) method for problems where the smooth part of the objective function … Read more