machine learning – Page 10 – Optimization Online

Newton-Like Methods for Sparse Inverse Covariance Estimation

Published: 2012/06/13

Convex Optimization convex optimization, machine learning, statistics

We propose two classes of second-order optimization methods for solving the sparse inverse covariance estimation problem. The first approach, which we call the Newton-LASSO method, minimizes a piecewise quadratic model of the objective function at every iteration to generate a step. We employ the fast iterative shrinkage thresholding method (FISTA) to solve this subproblem. The … Read more

Factoring nonnegative matrices with linear programs

Published: 2012/06/06

Data-Mining, Linear Programming linear programming, machine learning, multicore, nonnegative matrix factorization, parallel computing, stochastic gradient descent

This paper describes a new approach for computing nonnegative matrix factorizations (NMFs) with linear programming. The key idea is a data-driven model for the factorization, in which the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that … Read more

D-ADMM: A Communication-Efficient Distributed Algorithm For Separable Optimization

Published: 2012/02/13

Convex Optimization, Network Optimization alternating direction method of multipliers, compressed sensing, consensus, distributed algorithms, machine learning, sensor networks

We propose a distributed algorithm, named D-ADMM, for solving separable optimization problems in networks of interconnected nodes or agents. In a separable optimization problem, the cost function is the sum of all the agents’ private cost functions, and the constraint set is the intersection of all the agents’ private constraint sets. We require the private … Read more

A First-Order Smoothing Technique for a Class of Large-Scale Linear Programs

Published: 2011/11/07

Samuel Burer

Jieqiu Chen

Data-Mining, Linear Programming, Nonsmooth Optimization large scale linear programming, machine learning, nonsmooth optimization, smoothing technique

We study a class of linear programming (LP) problems motivated by large-scale machine learning applications. After reformulating the LP as a convex nonsmooth problem, we apply Nesterov’s primal-dual smoothing technique. It turns out that the iteration complexity of the smoothing technique depends on a parameter $\th$ that arises because we need to bound the originally … Read more

Sample Size Selection in Optimization Methods for Machine Learning

Published: 2011/11/02

Nonlinear Optimization, Stochastic Programming l1-regularization, machine learning, optimization

This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance … Read more

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

Published: 2011/06/28, Updated: 2011/11/11

Data-Mining, Nonlinear Optimization, Stochastic Programming incremental gradient methods, machine learning, multicore, parallel computing, stochastic gradient descent

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented *without … Read more

On the Use of Stochastic Hessian Information in Unconstrained Optimization

Published: 2010/06/16

Nonlinear Optimization, Stochastic Programming limited memory bfgs method, machine learning, unconstrained optimization

This paper describes how to incorporate stochastic curvature information in a Newton- CG method and in a limited memory quasi-Newton method for large scale optimization. The motivation for this work stems from statistical learning and stochastic optimization applications in which the objective function is the sum of a very large number of loss terms, and … Read more

An Improved Branch-and-Bound Method for Maximum Monomial Agreement

Published: 2009/11/08

Jonathan Eckstein

Noam Goldberg

0-1 Programming, Branch and Cut Algorithms, Data-Mining boolean functions, Branch-and-Bound, machine learning, maximum agreement, monomials

The NP-hard Maximum Monomial Agreement (MMA) problem consists of finding a single logical conjunction that best fits a weighted dataset of “positive” and “negative” binary vectors. Computing classifiers using boosting methods involves a maximum agreement subproblem at each iteration, although such subproblems are typically solved by heuristic methods. Here, we describe an exact branch and … Read more

Machine Learning for Global Optimization

Published: 2009/07/23, Updated: 2009/12/30

Data-Mining, Global Optimization global optimization, machine learning, space trajectory design, support vector machines

In this paper we introduce the LeGO (Learning for Global Optimization) approach for global optimization in which machine learning is used to predict the outcome of a computationally expensive global optimization run, based upon a suitable training performed by standard runs of the same global optimization method. We propose to use a Support Vector Machine … Read more

Convergence and Convergence Rate of Stochastic Gradient Search in the Case of Multiple and Non-Isolated Extrema

Published: 2009/07/07, Updated: 2009/07/17

Vladislav B. Tadic

Control Applications, Stochastic Programming convergence rate, lojasiewicz gradient inequality, machine learning, point-convergence, stochastic gradient search, system identification

The asymptotic behavior of stochastic gradient algorithms is studied. Relying on some results of differential geometry (Lojasiewicz gradient inequality), the almost sure point-convergence is demonstrated and relatively tight almost sure bounds on the convergence rate are derived. In sharp contrast to all existing result of this kind, the asymptotic results obtained here do not require … Read more