variance reduction – Page 2 – Optimization Online

Optimization for Supervised Machine Learning: Randomized Algorithms for Data and Parameters

Published: 2020/08/13

Convex Optimization, Stochastic Programming coordinate descent, cubic newton, machine learning, optimization, stochastic gradient, variance reduction

Many key problems in machine learning and data science are routinely modeled as optimization problems and solved via optimization algorithms. With the increase of the volume of data and the size and complexity of the statistical models used to formulate these often ill-conditioned optimization tasks, there is a need for new efficient algorithms able to … Read more

Stochastic Variance-Reduced Prox-Linear Algorithms for Nonconvex Composite Optimization

Published: 2020/04/08

Lin Xiao

Junyu Zhang

Nonlinear Optimization, Nonsmooth Optimization, Stochastic Programming nonsmooth optimization, prox-linear algorithm, sample complexity, stochastic composite optimization, variance reduction

We consider minimization of composite functions of the form $f(g(x))+h(x)$, where $f$ and $h$ are convex functions (which can be nonsmooth) and $g$ is a smooth vector mapping. In addition, we assume that $g$ is the average of finite number of component mappings or the expectation over a family of random component mappings. We propose … Read more

Variance Reduction of Stochastic Gradients Without Full Gradient Evaluation

Published: 2020/03/11

Florian Jarre

Felix Lieder

Unconstrained Optimization stochastic gradients, variance reduction

A standard concept for reducing the variance of stochastic gradient approximations is based on full gradient evaluations every now and then. In this paper an approach is considered that — while approximating a local minimizer of a sum of functions — also generates approximations of the gradient and the function values without relying on full … Read more

Inexact proximal stochastic second-order methods for nonconvex composite optimization

Published: 2019/10/10, Updated: 2019/10/14

Xiao Wang

Hongchao Zhang

Nonlinear Optimization (weakly) smooth function, complexity, inexact subproblem solution, nonconvex, second-order approximation, stochastic gradient, variance reduction

In this paper, we propose a framework of Inexact Proximal Stochastic Second-order (IPSS) methods for solving nonconvex optimization problems, whose objective function consists of an average of finitely many, possibly weakly, smooth functions and a convex but possibly nons- mooth function. At each iteration, IPSS inexactly solves a proximal subproblem constructed by using some positive … Read more

A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

Published: 2016/09/21, Updated: 2016/10/04

Aleksandr Y. Aravkin

Damek Davis

Nonlinear Optimization, Nonsmooth Optimization, Statistics machine learning, nonconvex optimization, saga, smart, svrg, trimmed estimators, variance reduction

Machine learning theory typically assumes that training data is unbiased and not adversarially generated. When real training data deviates from these assumptions, trained models make erroneous predictions, sometimes with disastrous effects. Robust losses, such as the huber norm are designed to mitigate the effects of such contaminated data, but they are limited to the regression … Read more

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Lojasiewicz Condition

Published: 2016/08/16, Updated: 2020/09/12

Hamed Karimi

Mark Schmidt

Julie Nutini

Convex and Nonsmooth Optimization boosting, coordinate descent, gradient descent, l1-regularization, least squares, logistic regression, stochastic gradient, support vector machines, variance reduction

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-Lojasiewicz (PL) … Read more

SMART: The Stochastic Monotone Aggregated Root-Finding Algorithm

Published: 2015/12/10, Updated: 2015/12/29

Damek Davis

Convex and Nonsmooth Optimization aggregated gradient, asynchronous updates, coordinate updates, operator splitting, stochastic algorithm, variance reduction

We introduce the Stochastic Monotone Aggregated Root-Finding (SMART) algorithm, a new randomized operator-splitting scheme for finding roots of finite sums of operators. These algorithms are similar to the growing class of incremental aggregated gradient algorithms, which minimize finite sums of functions; the difference is that we replace gradients of functions with black-boxes called operators, which … Read more

Importance Sampling in Stochastic Programming: A Markov Chain Monte Carlo Approach

Published: 2012/08/06, Updated: 2013/11/05

Stochastic Programming benders decomposition, importance sampling, markov chain monte carlo, stochastic programming, variance reduction

Stochastic programming models are large-scale optimization problems that are used to facilitate decision-making under uncertainty. Optimization algorithms for such problems need to evaluate the expected future costs of current decisions, often referred to as the recourse function. In practice, this calculation is computationally difficult as it requires the evaluation of a multidimensional integral whose integrand … Read more