Machine Learning and Portfolio Optimization

The portfolio optimization model has limited impact in practice due to estimation issues when applied with real data. To address this, we adapt two machine learning methods, regularization and cross-validation, for portfolio optimization. First, we introduce performance-based regularization (PBR), where the idea is to constrain the sample variances of the estimated portfolio risk and return, … Read more

Playing with Duality: An Overview of Recent Primal-Dual Approaches for Solving Large-Scale Optimization Problems

Optimization methods are at the core of many problems in signal/image processing, computer vision, and machine learning. For a long time, it has been recognized that looking at the dual of an optimization problem may drastically simplify its solution. Deriving efficient strategies which jointly brings into play the primal and the dual problems is however … Read more

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Abstract The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi- Newton updating techniques for deterministic optimization leads to noisy curvature estimates that have harmful effects on the robustness of the iteration. In this paper, we propose a stochastic quasi-Newton method that is efficient, robust … Read more

Incremental Accelerated Gradient Methods for SVM Classification: Study of the Constrained Approach

We investigate constrained first order techniques for training Support Vector Machines (SVM) for online classification tasks. The methods exploit the structure of the SVM training problem and combine ideas of incremental gradient technique, gradient acceleration and successive simple calculations of Lagrange multipliers. Both primal and dual formulations are studied and compared. Experiments show that the … Read more

Newton-Like Methods for Sparse Inverse Covariance Estimation

We propose two classes of second-order optimization methods for solving the sparse inverse covariance estimation problem. The first approach, which we call the Newton-LASSO method, minimizes a piecewise quadratic model of the objective function at every iteration to generate a step. We employ the fast iterative shrinkage thresholding method (FISTA) to solve this subproblem. The … Read more

Factoring nonnegative matrices with linear programs

This paper describes a new approach for computing nonnegative matrix factorizations (NMFs) with linear programming. The key idea is a data-driven model for the factorization, in which the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that … Read more

D-ADMM: A Communication-Efficient Distributed Algorithm For Separable Optimization

We propose a distributed algorithm, named D-ADMM, for solving separable optimization problems in networks of interconnected nodes or agents. In a separable optimization problem, the cost function is the sum of all the agents’ private cost functions, and the constraint set is the intersection of all the agents’ private constraint sets. We require the private … Read more

A First-Order Smoothing Technique for a Class of Large-Scale Linear Programs

We study a class of linear programming (LP) problems motivated by large-scale machine learning applications. After reformulating the LP as a convex nonsmooth problem, we apply Nesterov’s primal-dual smoothing technique. It turns out that the iteration complexity of the smoothing technique depends on a parameter $\th$ that arises because we need to bound the originally … Read more

Sample Size Selection in Optimization Methods for Machine Learning

This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance … Read more

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented *without … Read more