Branch-and-bound Algorithm for Optimal Sparse Canonical Correlation Analysis

Canonical correlation analysis (CCA) is a family of multivariate statistical methods for extracting mutual information contained in multiple datasets. To improve the interpretability of CCA, here we focus on the mixed-integer optimization (MIO) approach to sparse estimation. This approach was first proposed for sparse linear regression in the 1970s, but it has recently received renewed … Read more

Best Subset Selection via Cross-validation Criterion

This paper is concerned with the cross-validation criterion for best subset selection in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ $C_p$, AIC, BIC, and various information criteria), the cross-validation only requires the mild assumptions, namely, samples are identically distributed, and training and validation samples are independent. For this … Read more

Mixed Integer Quadratic Optimization Formulations for Eliminating Multicollinearity Based on Variance Inflation Factor

The variance inflation factor, VIF, is the most frequently used indicator for detecting multicollinearity in multiple linear regression models. This paper proposes two mixed integer quadratic optimization formulations for selecting the best subset of explanatory variables under upper-bound constraints on VIF of selected variables. Computational results illustrate the effectiveness of our optimization formulations based on … Read more

Best subset selection for eliminating multicollinearity

This paper proposes a method for eliminating multicollinearity from linear regression models. Specifically, we select the best subset of explanatory variables subject to the upper bound on the condition number of the correlation matrix of selected variables. We first develop a cutting plane algorithm that, to approximate the condition number constraint, iteratively appends valid inequalities … Read more

Subset Selection by Mallows’ Cp: A Mixed Integer Programming Approach

This paper concerns a method of selecting the best subset of explanatory variables for a linear regression model. Employing Mallows’ C_p as a goodness-of-fit measure, we formulate the subset selection problem as a mixed integer quadratic programming problem. Computational results demonstrate that our method provides the best subset of variables in a few seconds when … Read more

Mixed Integer Second-Order Cone Programming Formulations for Variable Selection

This paper concerns the method of selecting the best subset of explanatory variables in a multiple linear regression model. To evaluate a subset regression model, some goodness-of-fit measures, e.g., adjusted R^2, AIC and BIC, are generally employed. Although variable selection is usually handled via a stepwise regression method, the method does not always provide the … Read more

Semidefinite Programming Based Approaches to Home-away Assignment Problems in Sports Scheduling

For a given schedule of a round-robin tournament and a matrix of distances between homes of teams, an optimal home-away assignment problem is to find a home-away assignment that minimizes the total traveling distance. We propose a technique to transform the problem to MIN RES CUT. We apply Goemans and Williamson’s 0.878-approximation algorithm for MAX … Read more