On Relatively Smooth Optimization over Riemannian Manifolds

We study optimization over Riemannian embedded submanifolds, where the objective function is relatively smooth in the ambient Euclidean space. Such problems have broad applications but are still largely unexplored. We introduce two Riemannian first-order methods, namely the retraction-based and projection-based Riemannian Bregman gradient methods, by incorporating the Bregman distance into the update steps. The retraction-based … Read more

Swapping objectives accelerates Davis-Yin splitting

In this work, we investigate the application of Davis–Yin splitting (DYS) to convex optimization problems and demonstrate that swapping the roles of the two nonsmooth convex functions can result in a faster convergence rate. Such a swap typically yields a different sequence of iterates, but its impact on convergence behavior has been largely understudied or … Read more

An Adaptive and Parameter-Free Nesterov’s Accelerated Gradient Method for Convex Optimization

We propose AdaNAG, an adaptive accelerated gradient method based on Nesterov’s accelerated gradient method. AdaNAG is line-search-free, parameter-free, and achieves the accelerated convergence rates \( f(x_k) – f_\star = \mathcal{O}\left(1/k^2\right) \) and \( \min_{i\in\left\{1,\dots, k\right\}} \|\nabla f(x_i)\|^2 = \mathcal{O}\left(1/k^3\right) \) for \( L \)-smooth convex function \( f \). We provide a Lyapunov analysis for … Read more

Fully First-Order Methods for Decentralized Bilevel Optimization

This paper focuses on decentralized stochastic bilevel optimization (DSBO) where agents only communicate with their neighbors. We propose Decentralized Stochastic Gradient Descent and Ascent with Gradient Tracking (DSGDA-GT), a novel algorithm that only requires first-order oracles that are much cheaper than second-order oracles widely adopted in existing works. We further provide a finite-time convergence analysis … Read more

Single-Timescale Multi-Sequence Stochastic Approximation Without Fixed Point Smoothness: Theories and Applications

Stochastic approximation (SA) that involves multiple coupled sequences, known as multiple-sequence SA (MSSA), finds diverse applications in the fields of signal processing and machine learning. However, existing theoretical understandings of MSSA are limited: the multi-timescale analysis implies a slow convergence rate, whereas the single-timescale analysis relies on a stringent fixed point smoothness assumption. This paper … Read more

Adaptive Algorithms for Robust Phase Retrieval

This paper considers the robust phase retrieval, which can be cast as a nonsmooth and nonconvex optimization problem. We propose two first-order algorithms with adaptive step sizes: the subgradient algorithm (AdaSubGrad) and the inexact proximal linear algorithm (AdaIPL). Our contribution lies in the novel design of adaptive step sizes based on quantiles of the absolute … Read more

Relaxed Proximal Point Algorithm: Tight Complexity Bounds and Acceleration without Momentum

In this paper, we focus on the relaxed proximal point algorithm (RPPA) for solving convex (possibly nonsmooth) optimization problems. We conduct a comprehensive study on three types of relaxation schedules: (i) constant schedule with relaxation parameter \(\alpha_k\equiv \alpha \in (0, \sqrt{2}]\), (ii) the dynamic schedule put forward by Teboulle and Vaisbourd [TV23], and (iii) the … Read more

Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

Bilevel optimization has recently attracted considerable attention due to its abundant applications in machine learning problems. However, existing methods rely on prior knowledge of problem parameters to determine stepsizes, resulting in significant effort in tuning stepsizes when these parameters are unknown. In this paper, we propose two novel tuning-free algorithms, D-TFBO and S-TFBO. D-TFBO employs … Read more

Problem-Parameter-Free Decentralized Nonconvex Stochastic Optimization

Existing decentralized algorithms usually require knowledge of problem parameters for updating local iterates. For example, the hyperparameters (such as learning rate) usually require the knowledge of Lipschitz constant of the global gradient or topological information of the communication networks, which are usually not accessible in practice. In this paper, we propose D-NASA, the first algorithm … Read more

Riemannian Bilevel Optimization

In this work, we consider the bilevel optimization problem on Riemannian manifolds. We inspect the calculation of the hypergradient of such problems on general manifolds and thus enable the utilization of gradient-based algorithms to solve such problems. The calculation of the hypergradient requires utilizing the notion of Riemannian cross-derivative and we inspect the properties and … Read more