Deep learning and hyperparameter optimization for assessing one’s eligibility for a subcutaneous implantable cardioverter-defibrillator

In cardiology, it is standard for patients suffering from ventricular arrhythmias (the leading cause of sudden cardiac death) belonging to high risk populations to be treated using Subcutaneous Implantable Cardioverter-Defibrillators (S-ICDs). S-ICDs carry a risk of so-called T Wave Over Sensing (TWOS), which can lead to inappropriate shocks with an inherent health risk. For this … Read more

A polyhedral study of multivariate decision trees

Decision trees are a widely used tool for interpretable machine learning. Multivariate decision trees employ hyperplanes at the branch nodes to route datapoints throughout the tree and yield more compact models than univariate trees. Recently, mixed-integer programming (MIP) has been applied to formulate the optimal decision tree problem. To strengthen MIP formulations, it is crucial … Read more

Inexact Proximal-Gradient Methods with Support Identification

\(\) We consider the proximal-gradient method for minimizing an objective function that is the sum of a smooth function and a non-smooth convex function. A feature that distinguishes our work from most in the literature is that we assume that the associated proximal operator does not admit a closed-form solution. To address this challenge, we … Read more

Simple Alternating Minimization Provably Solves Complete Dictionary Learning

This paper focuses on complete dictionary learning problem, where the goal is to reparametrize a set of given signals as linear combinations of atoms from a learned dictionary. There are two main challenges faced by theoretical and practical studies of dictionary learning: the lack of theoretical guarantees for practically-used heuristic algorithms, and their poor scalability … Read more

Decentralized Stochastic Bilevel Optimization with Improved Per-Iteration Complexity

Bilevel optimization recently has received tremendous attention due to its great success in solving important machine learning problems like meta learning, reinforcement learning, and hyperparameter optimization. Extending single-agent training on bilevel problems to the decentralized setting is a natural generalization, and there has been a flurry of work studying decentralized bilevel optimization algorithms. However, it … Read more

Wasserstein Regularization for 0-1 Loss

Wasserstein distributionally robust optimization (DRO) finds robust solutions by hedging against data perturbation specified by distributions in a Wasserstein ball. The robustness is linked to the regularization effect, which has been studied for continuous losses in various settings. However, existing results cannot be simply applied to the 0-1 loss, which is frequently seen in uncertainty … Read more

Decision-making with Side Information: A Causal Transport Robust Approach

We consider stochastic optimization with side information where, prior to decision making, covariate data are available to inform better decisions. In particular, we propose to consider a distributionally robust formulation based on causal transport distance. Compared with divergence and Wasserstein metric, the causal transport distance is better at capturing the information structure revealed from the conditional distribution … Read more

Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition

This work analyzes the solution trajectory of gradient-based algorithms via a novel basis function decomposition. We show that, although solution trajectories of gradient-based algorithms may vary depending on the learning task, they behave almost monotonically when projected onto an appropriate orthonormal function basis. Such projection gives rise to a basis function decomposition of the solution … Read more

Sparse PCA With Multiple Components

Sparse Principal Component Analysis is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. At its heart, this involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing work address sparse PCA via heuristics … Read more

Fixed-Point Automatic Differentiation of Forward–Backward Splitting Algorithms for Partly Smooth Functions

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We consider such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity … Read more