A single potential governing convergence of conjugate gradient, accelerated gradient and geometric descent

Nesterov’s accelerated gradient (AG) method for minimizing a smooth strongly convex function $f$ is known to reduce $f({\bf x}_k)-f({\bf x}^*)$ by a factor of $\epsilon\in(0,1)$ after $k=O(\sqrt{L/\ell}\log(1/\epsilon))$ iterations, where $\ell,L$ are the two parameters of smooth strong convexity. Furthermore, it is known that this is the best possible complexity in the function-gradient oracle model of … Read more

Long-Step Path-Following Algorithm for Solving Symmetric Programming Problems with Nonlinear Objective Functions

We describe a long-step path-following algorithm for a class of symmetric programming problems with nonlinear convex objective functions. The complexity estimates similar to the case of a linear-quadratic objective function are established. The results of numerical experiments for the class of optimization problems involving quantum entropy are presented. Citation Preprint, University of Notre Dame, December … Read more

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity

We generalize the classic convergence rate theory for subgradient methods to apply to non-Lipschitz functions via a new measure of steepness. For the deterministic projected subgradient method, we derive a global $O(1/\sqrt{T})$ convergence rate for any function with at most exponential growth. Our approach implies generalizations of the standard convergence rates for gradient descent on … Read more

”Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?

Proximal gradient methods have been found to be highly effective for solving minimization problems with non-negative constraints or L1-regularization. Under suitable nondegeneracy conditions, it is known that these algorithms identify the optimal sparsity pattern for these types of problems in a finite number of iterations. However, it is not known how many iterations this may … Read more

Bootstrap Robust Prescriptive Analytics

We address the problem of prescribing an optimal decision in a framework where its cost depends on uncertain problem parameters $Y$ that need to be learned from data. Earlier work by Bertsimas and Kallus (2014) transforms classical machine learning methods that merely predict $Y$ from supervised training data $[(x_1, y_1), \dots, (x_n, y_n)]$ into prescriptive … Read more

Random Gradient Extrapolation for Distributed and Stochastic Optimization

In this paper, we consider a class of finite-sum convex optimization problems defined over a distributed multiagent network with $m$ agents connected to a central server. In particular, the objective function consists of the average of $m$ ($\ge 1$) smooth components associated with each network agent together with a strongly convex term. Our major contribution … Read more

A One-Parameter Family of Middle Proximal ADMM for Constrained Separable Convex Optimization

This work is devoted to studying a family of Middle Proximal Alternating Direction Method of Multipliers (MP-ADM) for solving multi-block constrained separable convex optimization. Such one-parameter family of MP-ADM combines both Jacobian and Gauss-Seidel types of alternating direction method, and proximal point techniques are only applied to the middle subproblems to promote the convergence. We … Read more

Set-Completely-Positive Representations and Cuts for the Max-Cut Polytope and the Unit Modulus Lifting

This paper considers a generalization of the “max-cut-polytope” $\conv\{\ xx^T\mid x\in\real^n, \ \ |x_k| = 1 \ \hbox{for} \ 1\le k\le n\}$ in the space of real symmetric $n\times n$-matrices with all-ones-diagonal to a complex “unit modulus lifting” $\conv\{xx\HH\mid x\in\complex^n, \ \ |x_k| = 1 \ \hbox{for} \ 1\le k\le n\}$ in the space of … Read more

Adaptive Sampling Strategies for Stochastic Optimization

In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the regular computation of full gradients, the proposed method reduces variance by increasing the sample size as needed. The decision to increase … Read more

Convergence rates of accelerated proximal gradient algorithms under independent noise

We consider an accelerated proximal gradient algorithm for the composite optimization with “independent errors” (errors little related with historical information) for solving linear inverse problems. We present a new inexact version of FISTA algorithm considering deterministic and stochastic noises. We prove some convergence rates of the algorithm and we connect it with the current existing … Read more