statistics – Optimization Online

Dual optimal design and the Christoffel-Darboux polynomial

Published: 2020/09/07

Convex Optimization, Semi-definite Programming, Statistics convex analysis, data science, semidefinite programming, statistics

The purpose of this short note is to show that the Christoffel-Darboux polynomial, useful in approximation theory and data science, arises naturally when deriving the dual to the problem of semi-algebraic D-optimal experimental design in statistics. It uses only elementary notions of convex analysis. ArticleDownload View PDF

Stochastic Discrete First-order Algorithm for Feature Subset Selection

Published: 2019/10/09

Kota Kudo

Yuichi Takano

Ryo Nomura

Integer Programming, Statistics feature subset selection, linear regression, machine learning, optimization algorithm, statistics

This paper addresses the problem of selecting a significant subset of candidate features to use for multiple linear regression. Bertsimas et al. (2016) recently proposed the discrete first-order (DFO) algorithm to efficiently find near-optimal solutions to this problem. However, this algorithm is unable to escape from locally optimal solutions. To resolve this, we propose a … Read more

Best Subset Selection via Cross-validation Criterion

Published: 2019/01/13

Ryuhei Miyashiro

Yuichi Takano

Integer Programming, Statistics cross-validation, integer programming, ridge regression, statistics, subset selection

This paper is concerned with the cross-validation criterion for best subset selection in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ $C_p$, AIC, BIC, and various information criteria), the cross-validation only requires the mild assumptions, namely, samples are identically distributed, and training and validation samples are independent. For this … Read more

Variational analysis perspective on linear convergence of some first order methods for nonsmooth convex optimization problems

Published: 2018/10/23, Updated: 2019/10/06

Convex and Nonsmooth Optimization calmness, linear convergence, machine learning, metric subregularity, proximal alternating linearized minimization, proximal gradient method, randomized block coordinate proximal gradient method, statistics, variational analysis

We understand linear convergence of some first-order methods such as the proximal gradient method (PGM), the proximal alternating linearized minimization (PALM) algorithm and the randomized block coordinate proximal gradient method (R-BCPGM) for minimizing the sum of a smooth convex function and a nonsmooth convex function from a variational analysis perspective. We introduce a new analytic … Read more

Approximate Positively Correlated Distributions and Approximation Algorithms for D-optimal Design

Published: 2017/12/08, Updated: 2018/01/15

Mohit Singh

Weijun Xie

Approximation Algorithms, Data-Mining approximation algorithms, convex relaxation, d-optimal design, statistics

Experimental design is a classical problem in statistics and has also found new applications in machine learning. In the experimental design problem, the aim is to estimate an unknown vector x in m-dimensions from linear measurements where a Gaussian noise is introduced in each measurement. The goal is to pick k out of the given … Read more

Mixed Integer Quadratic Optimization Formulations for Eliminating Multicollinearity Based on Variance Inflation Factor

Published: 2016/09/29, Updated: 2016/12/02

Applications - Science and Engineering, Integer Programming, Statistics integer programming, multicollinearity, multiple linear regression, statistics, subset selection, variance inflation factor

The variance inflation factor, VIF, is the most frequently used indicator for detecting multicollinearity in multiple linear regression models. This paper proposes two mixed integer quadratic optimization formulations for selecting the best subset of explanatory variables under upper-bound constraints on VIF of selected variables. Computational results illustrate the effectiveness of our optimization formulations based on … Read more

Best subset selection for eliminating multicollinearity

Published: 2016/07/26

Applications - Science and Engineering, Global Optimization, Integer Programming linear regression, mixed integer semidefinite optimization, multicollinearity, optimization, statistics, subset selection

This paper proposes a method for eliminating multicollinearity from linear regression models. Specifically, we select the best subset of explanatory variables subject to the upper bound on the condition number of the correlation matrix of selected variables. We first develop a cutting plane algorithm that, to approximate the condition number constraint, iteratively appends valid inequalities … Read more

Mixed Integer Second-Order Cone Programming Formulations for Variable Selection

Published: 2013/06/25, Updated: 2016/09/30

Ryuhei Miyashiro

Yuichi Takano

(Mixed) Integer Nonlinear Programming, Second-Order Cone Programming, Statistics information criterion, integer programming, multiple linear regression, second-order cone programming, statistics, variable selection

This paper concerns the method of selecting the best subset of explanatory variables in a multiple linear regression model. To evaluate a subset regression model, some goodness-of-fit measures, e.g., adjusted R^2, AIC and BIC, are generally employed. Although variable selection is usually handled via a stepwise regression method, the method does not always provide the … Read more

Newton-Like Methods for Sparse Inverse Covariance Estimation

Published: 2012/06/13

Convex Optimization convex optimization, machine learning, statistics

We propose two classes of second-order optimization methods for solving the sparse inverse covariance estimation problem. The first approach, which we call the Newton-LASSO method, minimizes a piecewise quadratic model of the objective function at every iteration to generate a step. We employ the fast iterative shrinkage thresholding method (FISTA) to solve this subproblem. The … Read more

A New Computational Approach to Density Estimation with Semidefinite Programming

Published: 2003/11/29, Updated: 2003/12/19

Tadayoshi Fushiki

Takashi Tsuchiya

Shingo Horiuchi

Data-Mining, Semi-definite Programming, Statistics aic, density estimation, maximum likelihood estimation, semidefinite programming, statistics

Density estimation is a classical and important problem in statistics. The aim of this paper is to develop a new computational approach to density estimation based on semidefinite programming (SDP), a new technology developed in optimization in the last decade. We express a density as the product of a nonnegative polynomial and a base density … Read more