Continuous exact relaxation and alternating proximal gradient algorithm for partial sparse and partial group sparse optimization problems

In this paper, we consider a partial sparse and partial group sparse optimization problem, where the loss function is a continuously differentiable function (possibly nonconvex), and the penalty term consists of two parts associated with sparsity and group sparsity. The first part is the $\ell_0$ norm of ${\bf x}$, the second part is the $\ell_{2,0}$ … Read more

Accelerated Gradient Descent via Long Steps

Recently Grimmer [1] showed for smooth convex optimization by utilizing longer steps periodically, gradient descent’s state-of-the-art O(1/T) convergence guarantees can be improved by constant factors, conjecturing an accelerated rate strictly faster than O(1/T) could be possible. Here we prove such a big-O gain, establishing gradient descent’s first accelerated convergence rate in this setting. Namely, we … Read more

Using orthogonally structured positive bases for constructing positive k-spanning sets with cosine measure guarantees

\(\) Positive spanning sets span a given vector space by nonnegative linear combinations of their elements. These have attracted significant attention in recent years, owing to their extensive use in derivative-free optimization. In this setting, the quality of a positive spanning set is assessed through its cosine measure, a geometric quantity that expresses how well … Read more

Limited memory gradient methods for unconstrained optimization

The limited memory steepest descent method (Fletcher, 2012) for unconstrained optimization problems stores a few past gradients to compute multiple stepsizes at once. We review this method and propose new variants. For strictly convex quadratic objective functions, we study the numerical behavior of different techniques to compute new stepsizes. In particular, we introduce a method … Read more

Expected decrease for derivative-free algorithms using random subspaces

Derivative-free algorithms seek the minimum of a given function based only on function values queried at appropriate points. Although these methods are widely used in practice, their performance is known to worsen as the problem dimension increases. Recent advances in developing randomized derivative-free techniques have tackled this issue by working in low-dimensional subspaces that are … Read more

Regularized methods via cubic subspace minimization for nonconvex optimization

\(\) The main computational cost per iteration of adaptive cubic regularization methods for solving large-scale nonconvex problems is the computation of the step \(s_k\), which requires an approximate minimizer of the cubic model. We propose a new approach in which this minimizer is sought in a low dimensional subspace that, in contrast to classical approaches, … Read more

Non-asymptotic superlinear convergence of Nesterov accelerated BFGS

This paper studies the convergence of a Nesterov accelerated variant of the Broyden-Fletcher-Goldfarb-Shanno (NA-BFGS) quasi-Newton method in the setting where the objective function is strongly convex, its gradient is Lipschitz continuous, and its Hessian is Lipschitz continuous at the optimal point. We demonstrate that similar to the classic BFGS method, the Nesterov accelerated BFGS method … Read more

An Explicit Three-Term Polak-Ribière-Polyak Conjugate Gradient Method for Bicriteria Optimization

We propose in this paper a Polak-Ribière-Polyak conjugate gradient type method for solving bicriteria optimization problems by avoiding scalarization techniques. Two particular advantages in this contribution are to be noted. First, the suggested descent direction common to both criteria may be directly computed by a given formula without solving any intermediate subproblem. Second, the descent … Read more

Yet another fast variant of Newton’s method for nonconvex optimization

\(\) A second-order algorithm is proposed for minimizing smooth nonconvex functions that alternates between regularized Newton and negative curvature steps. In most cases, the Hessian matrix is regularized with the square root of the current gradient and an additional term taking moderate negative curvature into account, a negative curvature step being taken only exceptionnally. As … Read more