An Adaptive Sampling Sequential Quadratic Programming Method for Equality Constrained Stochastic Optimization

This paper presents a methodology for using varying sample sizes in sequential quadratic programming (SQP) methods for solving equality constrained stochastic optimization problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the gradient in conjunction with inexact solutions to the SQP subproblems. Under reasonable … Read more

Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization

We consider unconstrained stochastic optimization problems with no available gradient information. Such problems arise in settings from derivative-free simulation optimization to reinforcement learning. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a stochastic function using finite differences within a common random number framework. We develop modified versions of a norm … Read more

Constrained and Composite Optimization via Adaptive Sampling Methods

The motivation for this paper stems from the desire to develop an adaptive sampling method for solving constrained optimization problems in which the objective function is stochastic and the constraints are deterministic. The method proposed in this paper is a proximal gradient method that can also be applied to the composite optimization problem min f(x) … Read more

Adaptive Sampling Quasi-Newton Methods for Derivative-Free Stochastic Optimization

We consider stochastic zero-order optimization problems, which arise in settings from simulation optimization to reinforcement learning. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a stochastic function using finite differences within a common random number framework. We employ modified versions of a norm test and an inner product quasi-Newton test … Read more

A Progressive Batching L-BFGS Method for Machine Learning

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise … Read more

Adaptive Sampling Strategies for Stochastic Optimization

In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the regular computation of full gradients, the proposed method reduces variance by increasing the sample size as needed. The decision to increase … Read more

Balancing Communication and Computation in Distributed Optimization

Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains including machine learning, robotics and sensor networks. A distributed optimization method typically consists of two key components: communication and computation. More specifically, at every iteration (or every several iterations) of a distributed algorithm, each node in … Read more

An Investigation of Newton-Sketch and Subsampled Newton Methods

Sketching, a dimensionality reduction technique, has received much attention in the statistics community. In this paper, we study sketching in the context of Newton’s method for solving finite-sum optimization problems in which the number of variables and data points are both large. We study two forms of sketching that perform dimensionality reduction in data space: … Read more

Exact and Inexact Subsampled Newton Methods for Optimization

The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of … Read more