Conjugate gradient minimization methods (CGM) and their accelerated variants are widely used in machine learning applications. We focus on the use of cubic regularization to improve the CGM direction independent of the steplength (learning rate) computation. Using Shanno’s reformulation of CGM as a memoryless BFGS method, we derive new formulas for the regularized step direction, which can be evaluated without additional computational effort. The new step directions are shown to improve iteration counts and runtimes and reduce the need to restart the CGM.
Citation
Department of Decision Sciences and MIS, Bennett S. LeBow College of Business, Drexel University, Philadelphia, PA 19104. Sept 2021.