Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates

In this paper, we study smooth stochastic multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions. We assume access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle. For solving this class of problems, we propose two algorithms using moving-average stochastic estimates, and analyze … Read more

Stochastic Zeroth-order Riemannian Derivative Estimation and Optimization

We consider stochastic zeroth-order optimization over Riemannian submanifolds embedded in Euclidean space, where the task is to solve Riemannian optimization problem with only noisy objective function evaluations. Towards this, our main contribution is to propose estimators of the Riemannian gradient and Hessian from noisy objective function evaluations, based on a Riemannian version of the Gaussian … Read more

Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first design and analyze the Zeroth-Order Gradient Descent Ascent (ZO-GDA) algorithm, and provide … Read more

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning. Algorithmic convergence and statistical estimation rates are well-understood for such problems. However, quantifying the uncertainty associated with the underlying training algorithm is not well-studied in the non-convex setting. In order to address this short-coming, in this work, … Read more

Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT

We provide non-asymptotic convergence rates of the Polyak-Ruppert averaged stochastic gradient descent (SGD) to a normal random vector for a class of twice-differentiable test functions. A crucial intermediate step is proving a non-asymptotic martingale central limit theorem (CLT), i.e., establishing the rates of convergence of a multivariate martingale difference sequence to a normal random vector, … Read more

Non-asymptotic Results for Langevin Monte Carlo: Coordinate-wise and Black-box Sampling

Euler-Maruyama and Ozaki discretization of a continuous time diffusion process is a popular technique for sampling, that uses (upto) gradient and Hessian information of the density respectively. The Euler-Maruyama discretization has been used particularly for sampling under the name of Langevin Monte Carlo (LMC) for sampling from strongly log-concave densities. In this work, we make … Read more

Zeroth-order Nonconvex Stochastic Optimization: Handling Constraints, High-Dimensionality, and Saddle-Points

In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle-point avoiding. To handle constrained optimization, we first propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. To … Read more