An (s^r)hBcResolution ODE Framework for Understanding Discrete-Time Algorithms and Applications to the Linear Convergence of Minimax Problems

There has been a long history of using ordinary differential equations (ODEs) to understand the dynamic of discrete-time algorithms (DTAs). Surprisingly, there are still two fundamental and unanswered questions: (i) it is unclear how to obtain a \emph{suitable} ODE from a given DTA, and (ii) it is unclear the connection between the convergence of a DTA and its corresponding ODEs. In this paper, we propose a new machinery — an $O(s^r)$-resolution ODE framework — for analyzing the behaviors of a generic DTA, which (partially) answers the above two questions. The framework contains three steps: 1. To obtain a suitable ODE from a given DTA, we define a hierarchy of $O(s^r)$-resolution ODEs of a DTA parameterized by the degree $r$, where $s$ is the step-size of the DTA. We present a principal approach to construct the unique $O(s^r)$-resolution ODEs from a DTA; 2. To analyze the resulting ODE, we propose the $O(s^r)$-linear-convergence condition of a DTA with respect to an energy function, under which the $O(s^r)$-resolution ODE converges linearly to an optimal solution; 3. To bridge the convergence properties of a DTA and its corresponding ODEs, we define the properness of an energy function and show that the linear convergence of the $O(s^r)$-resolution ODE with respect to a proper energy function can automatically guarantee the linear convergence of the DTA. To better illustrate this machinery, we utilize it to study three classic algorithms — gradient descent ascent (GDA), proximal point method (PPM) and extra-gradient method (EGM) — for solving the unconstrained minimax problem $\min_{x\in\RR^n} \max_{y\in \RR^m} L(x,y)$. Their $O(s)$-resolution ODEs explain the puzzling convergent/divergent behaviors of GDA, PPM and EGM when $L(x,y)$ is a bilinear function, and showcase that the interaction terms help the convergence of PPM/EGM but hurts the convergence of GDA. Furthermore, their $O(s)$-linear-convergence conditions not only unify the known scenarios when PPM and EGM have linear convergence, but also showcase that these two algorithms exhibit linear convergence in much broader contexts, including when solving a class of nonconvex-nonconcave minimax problems. Finally, we show how this ODE framework can help design new optimization algorithms for minimax problems, by studying the difference between the $O(s)$-resolution ODE of GDA and that of PPM/EGM.

Citation

arXiv preprint arXiv:2001.08826 (2020)

Article

Download

View PDF