Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization

\(\) The Frank-Wolfe method has become increasingly useful in statistical and machine learning applications, due to the structure-inducing properties of the iterates, and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of Empirical Risk Minimization — one of the fundamental optimization problems in statistical … Read more

Analysis of the Frank-Wolfe Method for Convex Composite Optimization involving a Logarithmically-Homogeneous Barrier

We present and analyze a new generalized Frank-Wolfe method for the composite optimization problem (P): F*:= min_x f(Ax) + h(x), where f is a \theta-logarithmically-homogeneous self-concordant barrier and the function h has bounded domain but is possibly non-smooth. We show that our generalized Frank-Wolfe method requires O((Gap_0 + \theta + Var_h)\ln(\delta_0) + (\theta + Var_h)^2/\epsilon) … Read more

An Oblivious Ellipsoid Algorithm for Solving a System of (In)Feasible Linear Inequalities

The ellipsoid algorithm is a fundamental algorithm for computing a solution to the system of m linear inequalities in n variables (P) when its set of solutions has positive volume. However, when (P) is infeasible, the ellipsoid algorithm has no mechanism for proving that (P) is infeasible. This is in contrast to the other two … Read more

Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods

Logistic regression is one of the most popular methods in binary classification, wherein estimation of model parameters is carried out by solving the maximum likelihood (ML) optimization problem, and the ML estimator is defined to be the optimal solution of this problem. It is well known that the ML estimator exists when the data is … Read more

Generalized Stochastic Frank-Wolfe Algorithm with Stochastic “Substitute” Gradient for Structured Convex Optimization

The stochastic Frank-Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the guaranteed convergence rate for stochastic Frank-Wolfe compared to its deterministic counterpart. In this work, … Read more

Relatively-Smooth Convex Optimization by First-Order Methods, and Applications

The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant L. However, in many settings the differentiable convex function f(.) is not uniformly smooth — for example in D-optimal design where f(x):=-ln det(HXH^T), or even the univariate … Read more

An Extended Frank-Wolfe Method with “In-Face” Directions, and its Application to Low-Rank Matrix Completion

We present an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low-dimensional faces of the feasible region. We present computational guarantees for the method that trade off efficiency in computing near-optimal solutions with upper bounds on the dimension of minimal faces of iterates. We apply our method to the low-rank … Read more

New Computational Guarantees for Solving Convex Optimization Problems with First Order Methods, via a Function Growth Condition Measure

Motivated by recent work of Renegar, we present new computational methods and associated computational guarantees for solving convex optimization problems using first-order methods. Our problem of interest is the general convex optimization problem f^* = \min_{x \in Q} f(x), where we presume knowledge of a strict lower bound f_slb < f^*. [Indeed, f_slb is naturally ... Read more

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

In this paper we analyze boosting algorithms in linear regression from a new perspective: that of modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm (FS-epsilon) and least squares boosting (LS-Boost-epsilon), can be viewed as subgradient descent to minimize the loss function defined … Read more

Fabrication-Adaptive Optimization, with an Application to Photonic Crystal Design

It is often the case that the computed optimal solution of an optimization problem cannot be implemented directly, irrespective of data accuracy, due to either (i) technological limitations (such as physical tolerances of machines or processes), (ii) the deliberate simplification of a model to keep it tractable (by ignoring certain types of constraints that pose … Read more