Optimization in Data Science – Page 6

Inexact Proximal-Gradient Methods with Support Identification

Published: 2022/11/03

Convex and Nonsmooth Optimization, Data Science Algorithms inexact proximal-gradient method, overlapping group regularizer, structured sparsity, support identification, worst-case iteration-complexity

\(\) We consider the proximal-gradient method for minimizing an objective function that is the sum of a smooth function and a non-smooth convex function. A feature that distinguishes our work from most in the literature is that we assume that the associated proximal operator does not admit a closed-form solution. To address this challenge, we … Read more

Simple Alternating Minimization Provably Solves Complete Dictionary Learning

Published: 2022/10/24

Optimization in Data Science

This paper focuses on complete dictionary learning problem, where the goal is to reparametrize a set of given signals as linear combinations of atoms from a learned dictionary. There are two main challenges faced by theoretical and practical studies of dictionary learning: the lack of theoretical guarantees for practically-used heuristic algorithms, and their poor scalability … Read more

Decentralized Stochastic Bilevel Optimization with Improved Per-Iteration Complexity

Published: 2022/10/23

Xuxing Chen

Minhui Huang

Shiqian Ma

Krishnakumar Balasubramanian

Nonlinear Optimization, Optimization in Data Science, Stochastic Programming

Bilevel optimization recently has received tremendous attention due to its great success in solving important machine learning problems like meta learning, reinforcement learning, and hyperparameter optimization. Extending single-agent training on bilevel problems to the decentralized setting is a natural generalization, and there has been a flurry of work studying decentralized bilevel optimization algorithms. However, it … Read more

Wasserstein Regularization for 0-1 Loss

Published: 2022/10/17

Rui Gao

Data Science Theory, Robust Optimization, Stochastic Programming

Wasserstein distributionally robust optimization (DRO) finds robust solutions by hedging against data perturbation specified by distributions in a Wasserstein ball. The robustness is linked to the regularization effect, which has been studied for continuous losses in various settings. However, existing results cannot be simply applied to the 0-1 loss, which is frequently seen in uncertainty … Read more

Decision-making with Side Information: A Causal Transport Robust Approach

Published: 2022/10/16, Updated: 2024/05/05

Optimization in Data Science, Robust Optimization, Stochastic Programming adjustable robust optimization, distributionally robust optimization (DRO), end-to-end learning, optimal transport

We consider stochastic optimization with side information where, prior to decision making, covariate data are available to inform better decisions. In particular, we propose to consider a distributionally robust formulation based on causal transport distance. Compared with divergence and Wasserstein metric, the causal transport distance is better at capturing the information structure revealed from the conditional distribution … Read more

Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition

Published: 2022/10/02

Jianhao Ma

Lingjun Guo

Salar Fattahi

Optimization in Data Science

This work analyzes the solution trajectory of gradient-based algorithms via a novel basis function decomposition. We show that, although solution trajectories of gradient-based algorithms may vary depending on the learning task, they behave almost monotonically when projected onto an appropriate orthonormal function basis. Such projection gives rise to a basis function decomposition of the solution … Read more

Sparse PCA With Multiple Components

Published: 2022/09/29, Updated: 2023/10/31

Ryan Cory-Wright

Jean Pauphilet

Data Science Theory, Data-Mining, Semi-definite Programming

Sparse Principal Component Analysis (sPCA) is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. This involves solving a sparsity and orthogonality-constrained convex maximization problem, which is extremely computationally challenging. Most existing works address sparse PCA via methods—such as iteratively computing … Read more

Fixed-Point Automatic Differentiation of Forward–Backward Splitting Algorithms for Partly Smooth Functions

Published: 2022/09/29

Sheheryar Mehmood

Peter Ochs

Convex and Nonsmooth Optimization, Optimization in Data Science bilevel optimization, convex optimization, first-order methods, partial smoothness, sensitivity analysis

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We consider such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity … Read more

On polynomial time solvability of combinatorial Markov random fields

Published: 2022/09/24

Shaoning Han

Andrés Gómez

Jong-Shi Pang

(Mixed) Integer Nonlinear Programming, Data Science Applications

The problem of inferring Markov random fields (MRFs) with a sparsity or robustness prior can be naturally modeled as a mixed-integer program. This motivates us to study a general class of convex submodular optimization problems with indicator variables, which we show to be polynomially solvable in this paper. The key insight is that, possibly after … Read more

Optimized convergence of stochastic gradient descent by weighted averaging

Published: 2022/09/23, Updated: 2022/10/05

Melinda Hagedorn

Florian Jarre

Convex Optimization, Data Science Theory, Stochastic Approaches convex optimization, noise, optimal step lengths, optimal weights, stochastic gradient descent, weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the … Read more