machine learning – Page 8 – Optimization Online

Random Sampling and Machine Learning to Understand Good Decompositions

Published: 2017/03/25, Updated: 2017/05/19

(Mixed) Integer Linear Programming, Data-Mining dantzig-wolfe decomposition, machine learning, random sampling

Motivated by its implications in the development of general purpose solvers for decomposable Mixed Integer Programs (MIP), we address a fundamental research question, that is to assess if good decomposition patterns can be consistently found by looking only at static properties of MIP input instances, or not. We adopt a data driven approach, devising a … Read more

RSG: Beating Subgradient Method without Smoothness and Strong Convexity

Published: 2016/11/01

Qihang Lin

Tianbao Yang

Convex and Nonsmooth Optimization improved convergence, local error bound, machine learning, subgradient method

In this paper, we study the efficiency of a {\bf R}estarted {\bf S}ub{\bf G}radient (RSG) method that periodically restarts the standard subgradient method (SG). We show that, when applied to a broad class of convex optimization problems, RSG method can find an $\epsilon$-optimal solution with a low complexity than SG method. In particular, we first … Read more

Exact and Inexact Subsampled Newton Methods for Optimization

Published: 2016/09/27

Raghu Bollapragada

Richard H. Byrd

Jorge Nocedal

Convex Optimization, Nonlinear Optimization, Stochastic Programming machine learning, subsampling

The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of … Read more

Max-Norm Optimization for Robust Matrix Recovery

Published: 2016/09/25

Convex Optimization, Semi-definite Programming, Statistics admm, machine learning, matrix completion, semidefinite programming

This paper studies the matrix completion problem under arbitrary sampling schemes. We propose a new estimator incorporating both max-norm and nuclear-norm regularization, based on which we can conduct efficient low-rank matrix recovery using a random subset of entries observed with additive noise under general non-uniform and unknown sampling distributions. This method significantly relaxes the uniform … Read more

A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

Published: 2016/09/21, Updated: 2016/10/04

Aleksandr Y. Aravkin

Damek Davis

Nonlinear Optimization, Nonsmooth Optimization, Statistics machine learning, nonconvex optimization, saga, smart, svrg, trimmed estimators, variance reduction

Machine learning theory typically assumes that training data is unbiased and not adversarially generated. When real training data deviates from these assumptions, trained models make erroneous predictions, sometimes with disastrous effects. Robust losses, such as the huber norm are designed to mitigate the effects of such contaminated data, but they are limited to the regression … Read more

Second-order optimality and beyond: characterization and evaluation complexity in convexly-constrained nonlinear optimization

Published: 2016/08/16

Coralia Cartis

Nicholas I. M. Gould

Philippe L. Toint

Bound-constrained Optimization, Constrained Nonlinear Optimization, Unconstrained Optimization complexity, high-order optimality conditions, machine learning, nonlinear optimization

High-order optimality conditions for convexly-constrained nonlinear optimization problems are analyzed. A corresponding (expensive) measure of criticality for arbitrary order is proposed and extended to define high-order $\epsilon$-approximate critical points. This new measure is then used within a conceptual trust-region algorithm to show that, if derivatives of the objective function up to order $q \geq 1$ … Read more

Convex Variational Formulations for Learning Problems

Published: 2016/08/13, Updated: 2016/08/18

Pedro Borges de Melo

Data-Mining classification, machine learning, nonlinear classification, nonlinear regression, optimization, quadratic programming, regression, variational formulations

Abstract—In this article, we introduce new techniques to solve the nonlinear regression problem and the nonlinear classification problem. Our benchmarks suggest that our method for regression is significantly more effective when compared to classical methods and our method for classification is competitive. Our list of classical methods includes least squares, random forests, decision trees, boosted … Read more

Optimization Methods for Large-Scale Machine Learning

Published: 2016/06/15, Updated: 2019/08/26

Léon Bottou

Frank E. Curtis

Jorge Nocedal

Convex and Nonsmooth Optimization, Nonlinear Optimization, Stochastic Programming algorithmic complexity, machine learning, noise reduction methods, numerical optimization, second-order methods, stochastic gradient methods

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of … Read more

A Stochastic Majorize-Minimize Subspace Algorithm for Online Penalized Least Squares Estimation

Published: 2016/01/05

Émilie Chouzenoux

Jean-Christophe Pesquet

Convex Optimization, Unconstrained Optimization adaptive filtering, descent methods, filter identification, machine learning, majorization-minimization, memory gradient methods, newton method, optimization, recursive algorithms, sparsity, stochastic approximation, subspace algorithms

Stochastic approximation techniques play an important role in solving many problems encountered in machine learning or adaptive signal processing. In these contexts, the statistics of the data are often unknown a priori or their direct computation is too intensive, and they have thus to be estimated online from the observed signals. For batch optimization of … Read more

On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method

Published: 2015/08/01, Updated: 2015/09/09

Convex Optimization, Nonsmooth Optimization alternating direction method of multipliers, block coordinate descent method, iterate convergence, large-scale optimization, machine learning, random permutation

The paper answers several open questions of the alternating direction method of multipliers (ADMM) and the block coordinate descent (BCD) method that are now wildly used to solve large scale convex optimization problems in many fields. For ADMM, it is still lack of theoretical understanding of the algorithm when the objective function is not separable … Read more