Polling Set Construction and Worst-Case Complexity for Direct Search under Polyhedral Convex Constraints

Direct search is one of the most popular derivative-free optimization paradigms, that relies on exploring the variable space using polling directions. To analyze and implement direct search, one typically relies on positive spanning sets. This concept is somewhat decorrelated from interpolation-based sets used in model-based algorithms, another class of derivative-free optimization methods. This discrepancy is … Read more

Accuracy and Relationships of Quadratic Models in Derivative-free Optimization

We study three quadratic models in model-based derivative-free optimization: the minimum norm (MN), minimum Frobenius norm (MFN), and quadratic generalized simplex derivative (QS) models. Despite their widespread use, their approximation accuracy and relationships have not been systematically explored. We establish fully linear error bounds for all three models, removing the uniformly bounded model Hessian assumption … Read more

Projected proximal gradient trust-region algorithm for nonsmooth optimization

We consider trust-region methods for solving optimization problems where the objective is the sum of a smooth, nonconvex function and a nonsmooth, convex regularizer. We extend the global convergence theory of such methods to include worst-case complexity bounds in the case of unbounded model Hessian growth, and introduce a new, simple nonsmooth trust-region subproblem solver … Read more

Randomized Subspace Derivative-Free Optimization with Quadratic Models and Second-Order Convergence

We consider model-based derivative-free optimization (DFO) for large-scale problems, based on iterative minimization in random subspaces. We provide the first worst-case complexity bound for such methods for convergence to approximate second-order critical points, and show that these bounds have significantly improved dimension dependence compared to standard full-space methods, provided low accuracy solutions are desired and/or … Read more

Black-box Optimization Algorithms for Regularized Least-squares Problems

We consider the problem of optimizing the sum of a smooth, nonconvex function for which derivatives are unavailable, and a convex, nonsmooth function with easy-to-evaluate proximal operator. Of particular focus is the case where the smooth part has a nonlinear least-squares structure. We adapt two existing approaches for derivative-free optimization of nonsmooth compositions of smooth … Read more

Model Construction for Convex-Constrained Derivative-Free Optimization

We develop a new approximation theory for linear and quadratic interpolation models, suitable for use in convex-constrained derivative-free optimization (DFO). Most existing model-based DFO methods for constrained problems assume the ability to construct sufficiently accurate approximations via interpolation, but the standard notions of accuracy (designed for unconstrained problems) may not be achievable by only sampling … Read more

Expected decrease for derivative-free algorithms using random subspaces

Derivative-free algorithms seek the minimum of a given function based only on function values queried at appropriate points. Although these methods are widely used in practice, their performance is known to worsen as the problem dimension increases. Recent advances in developing randomized derivative-free techniques have tackled this issue by working in low-dimensional subspaces that are … Read more

Analyzing Inexact Hypergradients for Bilevel Learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that … Read more

A Simplified Convergence Theory for Byzantine Resilient Stochastic Gradient Descent

In distributed learning, a central server trains a model according to updates provided by nodes holding local data samples. In the presence of one or more malicious servers sending incorrect information (a Byzantine adversary), standard algorithms for model training such as stochastic gradient descent (SGD) fail to converge. In this paper, we present a simplified … Read more

Direct search based on probabilistic descent in reduced spaces

Derivative-free algorithms seek the minimum value of a given objective function without using any derivative information. The performance of these methods often worsen as the dimension increases, a phenomenon predicted by their worst-case complexity guarantees. Nevertheless, recent algorithmic proposals have shown that incorporating randomization into otherwise deterministic frameworks could alleviate this effect for direct-search methods. … Read more