Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation

We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit trade-off between convergence guarantees and convergence speed, which contrasts with the typical worst-case analysis. Our learned optimization algorithms provably outperform related ones … Read more

Accelerated Gradient Dynamics on Riemannian Manifolds: Faster Rate and Trajectory Convergence

In order to minimize a differentiable geodesically convex function, we study a second-order dynamical system on Riemannian manifolds with an asymptotically vanishing damping term of the form \(\alpha/t\). For positive values of \(\alpha\), convergence rates for the objective values and convergence of trajectory is derived. We emphasize the crucial role of the curvature of the … Read more

Near-optimal closed-loop method via Lyapunov damping for convex optimization

We introduce an autonomous system with closed-loop damping for first-order convex optimization. While, to this day, optimal rates of convergence are only achieved by non-autonomous methods via open-loop damping (e.g., Nesterov’s algorithm), we show that our system is the first one featuring a closed-loop damping while exhibiting a rate arbitrarily close to the optimal one. … Read more

Fixed-Point Automatic Differentiation of Forward–Backward Splitting Algorithms for Partly Smooth Functions

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We consider such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity … Read more

An abstract convergence framework with application to inertial inexact forward-backward methods

In this paper we introduce a novel abstract descent scheme suited for the minimization of proper and lower semicontinuous functions. The proposed abstract scheme generalizes a set of properties that are crucial for the convergence of several first-order methods designed for nonsmooth nonconvex optimization problems. Such properties guarantee the convergence of the full sequence of … Read more

Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms

Matrix Factorization is a popular non-convex objective, for which alternating minimization schemes are mostly used. They usually suffer from the major drawback that the solution is biased towards one of the optimization variables. A remedy is non-alternating schemes. However, due to a lack of Lipschitz continuity of the gradient in matrix factorization problems, convergence cannot … Read more

Convex-Concave Backtracking for Inertial Bregman Proximal Gradient Algorithms in Non-Convex Optimization

Backtracking line-search is an old yet powerful strategy for finding better step size to be used in proximal gradient algorithms. The main principle is to locally find a simple convex upper bound of the objective function, which in turn controls the step size that is used. In case of inertial proximal gradient algorithms, the situation … Read more

On Quasi-Newton Forward–Backward Splitting: Proximal Calculus and Convergence

We introduce a framework for quasi-Newton forward–backward splitting algorithms (proximal quasi-Newton methods) with a metric induced by diagonal +/- rank-r symmetric positive definite matrices. This special type of metric allows for a highly efficient evaluation of the proximal mapping. The key to this efficiency is a general proximal calculus in the new metric. By using … Read more

Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano

An abstract convergence theorem for a class of generalized descent methods that explicitly models relative errors is proved. The convergence theorem generalizes and unifies several recent abstract convergence theorems. It is applicable to possibly non-smooth and non-convex lower semi-continuous functions that satisfy the Kurdyka–Lojasiewicz (KL) inequality, which comprises a huge class of problems. Most of … Read more

Adaptive Fista

In this paper we propose an adaptively extrapolated proximal gradient method, which is based on the accelerated proximal gradient method (also known as FISTA), however we locally optimize the extrapolation parameter by carrying out an exact (or inexact) line search. It turns out that in some situations, the proposed algorithm is equivalent to a class … Read more