We consider the problem of minimizing the sum of two convex functions: one is differentiable and relatively smooth with respect to a reference convex function, and the other can be nondifferentiable but simple to optimize. The relatively smooth condition is much weaker than the standard assumption of uniform Lipschitz continuity of the gradients, thus significantly increases the scope of potential applications. We present accelerated Bregman proximal gradient (ABPG) methods that employ the Bregman distance of the reference function as the proximity measure. These methods attain an $O(k^{-\gamma})$ convergence rate in the relatively smooth setting, where $\gamma\in [1,2]$ is determined by a triangle scaling property of the Bregman distance. We develop adaptive variants of the ABPG method that automatically ensure the best possible rate of convergence and argue that the $O(k^{-2})$ rate is attainable in most cases. We present numerical experiments with three applications: D-optimal experiment design, Poisson linear inverse problem, and relative-entropy nonnegative regression. In all experiments, we obtain numerical certificates showing that these methods do converge with the $O(k^{-2})$ rate.
Citation
Microsoft Research Technical Report: MSR-TR-2018-22
Article
View Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization