Best Subset Selection via Cross-validation Criterion

This paper is concerned with the cross-validation criterion for best subset selection in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ $C_p$, AIC, BIC, and various information criteria), the cross-validation only requires the mild assumptions, namely, samples are identically distributed, and training and validation samples are independent. For this … Read more

Subset selection in sparse matrices

In subset selection we search for the best linear predictor that involves a small subset of variables. From a computational complexity viewpoint, subset selection is NP-hard and few classes are known to be solvable in polynomial time. Using mainly tools from discrete geometry, we show that some sparsity conditions on the original data matrix allow … Read more

Mixed Integer Quadratic Optimization Formulations for Eliminating Multicollinearity Based on Variance Inflation Factor

The variance inflation factor, VIF, is the most frequently used indicator for detecting multicollinearity in multiple linear regression models. This paper proposes two mixed integer quadratic optimization formulations for selecting the best subset of explanatory variables under upper-bound constraints on VIF of selected variables. Computational results illustrate the effectiveness of our optimization formulations based on … Read more

Best subset selection for eliminating multicollinearity

This paper proposes a method for eliminating multicollinearity from linear regression models. Specifically, we select the best subset of explanatory variables subject to the upper bound on the condition number of the correlation matrix of selected variables. We first develop a cutting plane algorithm that, to approximate the condition number constraint, iteratively appends valid inequalities … Read more

Subset Selection by Mallows’ Cp: A Mixed Integer Programming Approach

This paper concerns a method of selecting the best subset of explanatory variables for a linear regression model. Employing Mallows’ C_p as a goodness-of-fit measure, we formulate the subset selection problem as a mixed integer quadratic programming problem. Computational results demonstrate that our method provides the best subset of variables in a few seconds when … Read more

On two relaxations of quadratically-constrained cardinality minimization

This paper considers a quadratically-constrained cardinality minimization problem with applications to digital filter design, subset selection for linear regression, and portfolio selection. Two relaxations are investigated: the continuous relaxation of a mixed integer formulation, and an optimized diagonal relaxation that exploits a simple special case of the problem. For the continuous relaxation, an absolute upper … Read more