Projected proximal gradient trust-region algorithm for nonsmooth optimization

We consider trust-region methods for solving optimization problems where the objective is the sum of a smooth, nonconvex function and a nonsmooth, convex regularizer. We extend the global convergence theory of such methods to include worst-case complexity bounds in the case of unbounded model Hessian growth, and introduce a new, simple nonsmooth trust-region subproblem solver … Read more

Randomized Subspace Derivative-Free Optimization with Quadratic Models and Second-Order Convergence

We consider model-based derivative-free optimization (DFO) for large-scale problems, based on iterative minimization in random subspaces. We provide the first worst-case complexity bound for such methods for convergence to approximate second-order critical points, and show that these bounds have significantly improved dimension dependence compared to standard full-space methods, provided low accuracy solutions are desired and/or … Read more

Black-box Optimization Algorithms for Regularized Least-squares Problems

We consider the problem of optimizing the sum of a smooth, nonconvex function for which derivatives are unavailable, and a convex, nonsmooth function with easy-to-evaluate proximal operator. Of particular focus is the case where the smooth part has a nonlinear least-squares structure. We adapt two existing approaches for derivative-free optimization of nonsmooth compositions of smooth … Read more

Model Construction for Convex-Constrained Derivative-Free Optimization

We develop a new approximation theory for linear and quadratic interpolation models, suitable for use in convex-constrained derivative-free optimization (DFO). Most existing model-based DFO methods for constrained problems assume the ability to construct sufficiently accurate approximations via interpolation, but the standard notions of accuracy (designed for unconstrained problems) may not be achievable by only sampling … Read more

Expected decrease for derivative-free algorithms using random subspaces

Derivative-free algorithms seek the minimum of a given function based only on function values queried at appropriate points. Although these methods are widely used in practice, their performance is known to worsen as the problem dimension increases. Recent advances in developing randomized derivative-free techniques have tackled this issue by working in low-dimensional subspaces that are … Read more

Analyzing Inexact Hypergradients for Bilevel Learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that … Read more

A Simplified Convergence Theory for Byzantine Resilient Stochastic Gradient Descent

In distributed learning, a central server trains a model according to updates provided by nodes holding local data samples. In the presence of one or more malicious servers sending incorrect information (a Byzantine adversary), standard algorithms for model training such as stochastic gradient descent (SGD) fail to converge. In this paper, we present a simplified … Read more

Direct search based on probabilistic descent in reduced spaces

Derivative-free algorithms seek the minimum value of a given objective function without using any derivative information. The performance of these methods often worsen as the dimension increases, a phenomenon predicted by their worst-case complexity guarantees. Nevertheless, recent algorithmic proposals have shown that incorporating randomization into otherwise deterministic frameworks could alleviate this effect for direct-search methods. … Read more

Model-Based Derivative-Free Methods for Convex-Constrained Optimization

We present a model-based derivative-free method for optimization subject to general convex constraints, which we assume are unrelaxable and accessed only through a projection operator that is cheap to evaluate. We prove global convergence and a worst-case complexity of $O(\epsilon^{-2})$ iterations and objective evaluations for nonconvex functions, matching results for the unconstrained case. We introduce … Read more

Scalable Subspace Methods for Derivative-Free Nonlinear Least-Squares Optimization

We introduce a general framework for large-scale model-based derivative-free optimization based on iterative minimization within random subspaces. We present a probabilistic worst-case complexity analysis for our method, where in particular we prove high-probability bounds on the number of iterations before a given optimality is achieved. This framework is specialized to nonlinear least-squares problems, with a … Read more