The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant L. However, in many settings the differentiable convex function f(.) is not uniformly smooth — for example in D-optimal design where f(x):=-ln det(HXH^T), or even the univariate setting with f(x) := -ln(x) + x^2. Herein we develop a notion of “relative smoothness” and relative strong convexity that is determined relative to a user-specified “reference function” h(.) (that should be computationally tractable for algorithms), and we show that many differentiable convex functions are relatively smooth with respect to a correspondingly fairly-simple reference function h(.). We extend two standard algorithms — the primal gradient scheme and the dual averaging scheme — to our new setting, with associated computational guarantees. We apply our new approach to develop a new first-order method for the D-optimal design problem, with associated computational complexity analysis. Some of our results have a certain overlap with the recent work of Bauschke, Bolte, and Teboulle.
Citation
MIT Operations Research Center Working Paper MIT