AI Hilbert: A New Paradigm for Scientific Discovery by Unifying Data and Background Knowledge

The discovery of scientific formulae that parsimoniously explain natural phenomena and align with existing background theory is a key goal in science. Historically, scientists have derived natural laws by manipulating equations based on existing knowledge, forming new equations, and verifying them experimentally. In recent years, data-driven scientific discovery has emerged as a viable competitor in settings with … Read more

Gain Confidence, Reduce Disappointment: A New Approach to Cross-Validation for Sparse Regression

Ridge regularized sparse linear regression involves selecting a subset of features that explains the relationship between a high-dimensional design matrix and an output vector in an interpretable manner. To select the sparsity and robustness of linear regressors, techniques like leave-one-out cross-validation are commonly used for hyperparameter tuning. However, cross-validation typically increases the cost of sparse … Read more

Optimal Low-Rank Matrix Completion: Semidefinite Relaxations and Eigenvector Disjunctions

Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We reformulate … Read more

A Stochastic Benders Decomposition Scheme for Large-Scale Data-Driven Network Design

Network design problems involve constructing edges in a transportation or supply chain network to minimize construction and daily operational costs. We study a data-driven version where operational costs are uncertain and estimated on historical data. This problem is computationally challenging, and instances with as few as 50 nodes cannot be solved to optimality by current … Read more

Sparse PCA With Multiple Components

Sparse Principal Component Analysis (sPCA) is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. This involves solving a sparsity and orthogonality-constrained convex maximization problem, which is extremely computationally challenging. Most existing works address sparse PCA via methods—such as iteratively computing … Read more

Decarbonizing OCP

Problem definition: We present our collaboration with the OCP Group, one of the world’s largest producers of phosphate and phosphate-based products, in support of a green initiative designed to reduce OCP’s carbon emissions significantly. We study the problem of decarbonizing OCP’s electricity supply by installing a mixture of solar panels and batteries to minimize its … Read more