This paper describes how to incorporate stochastic curvature information in a Newton- CG method and in a limited memory quasi-Newton method for large scale optimization. The motivation for this work stems from statistical learning and stochastic optimization applications in which the objective function is the sum of a very large number of loss terms, and can be evaluated with a varying degree of precision. Curvature information is incorporated into two proposed semi-stochastic algorithms via a matrix-free conjugate gradient iteration, which is applied to a system using a sampled (or stochastic) Hessian based on a small batch size. The efficiency of the proposed methods is illustrated using a machine learning application involving speech recognition.
Citation
unpublished: Technical Report, Northwestern University, Optimization Center, 2010/05