Optimization in Data Science – Optimization Online

A Generalization Result for Convergence in Learning-to-Optimize

Published: 2024/10/10

Convex and Nonsmooth Optimization, Nonsmooth Optimization, Optimization in Data Science learning-to-optimize, pac-bayes, stationary points

Convergence in learning-to-optimize is hardly studied, because conventional convergence guarantees in optimization are based on geometric arguments, which cannot be applied easily to learned algorithms. Thus, we develop a probabilistic framework that resembles deterministic optimization and allows for transferring geometric arguments into learning-to-optimize. Our main theorem is a generalization result for parametric classes of potentially … Read more

Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits

Published: 2024/10/07

Mengmeng Li

Daniel Kuhn

Bahar Tașkesen

Convex Optimization, Data Science Algorithms, Data Science Applications bandits, discrete choice models, onlinear learning

 Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as stochastic bandit problems and allow for a streamlined analysis. However, FTRL algorithms require the solution of an optimization problem in every iteration and are thus computationally challenging. In contrast, Follow-The-Perturbed-Leader (FTPL) algorithms achieve computational efficiency by perturbing the estimates of the rewards … Read more

Single-Loop Deterministic and Stochastic Interior-Point Algorithms for Nonlinearly Constrained Optimization

Published: 2024/08/28

Frank E. Curtis

Qi Wang

Xin Jiang

Constrained Nonlinear Optimization, Nonlinear Optimization, Optimization in Data Science

An interior-point algorithm framework is proposed, analyzed, and tested for solving nonlinearly constrained continuous optimization problems. The main setting of interest is when the objective and constraint functions may be nonlinear and/or nonconvex, and when constraint values and derivatives are tractable to compute, but objective function values and derivatives can only be estimated. The algorithm … Read more

A Markovian Model for Learning-to-Optimize

Published: 2024/08/21

Michael Sucker

Peter Ochs

Optimization in Data Science, Stochastic Approaches, Stochastic Programming convergence rate, learning-to-optimize, pac-bayes, stochastic processes, stopping time

We present a probabilistic model for stochastic iterative algorithms with the use case of optimization algorithms in mind. Based on this model, we present PAC-Bayesian generalization bounds for functions that are defined on the trajectory of the learned algorithm, for example, the expected (non-asymptotic) convergence rate and the expected time to reach the stopping criterion. … Read more

Forecasting Urban Traffic States with Sparse Data Using Hankel Temporal Matrix Factorization

Published: 2024/08/13

Chun Cheng

Data Science Algorithms, Data Science Applications, Transportation hankel matrix, machine learning, matrix factorization, traffic state forecasting, Urban transportation network

Forecasting urban traffic states is crucial to transportation network monitoring and management, playing an important role in the decision-making process. Despite the substantial progress that has been made in developing accurate, efficient, and reliable algorithms for traffic forecasting, most existing approaches fail to handle sparsity, high-dimensionality, and nonstationarity in traffic time series and seldom consider … Read more

Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

Published: 2024/07/19

Anirbit Mukherjee

Data Science Algorithms, Global Optimization Theory, Nonlinear Optimization

In this work, we instantiate a regularized form of the gradient clipping algorithm and prove that it can converge to the global minima of deep neural network loss functions provided that the net is of sufficient width. We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art … Read more

Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM

Published: 2024/07/18, Updated: 2024/10/02

Dimitris Bertsimas

Nicholas A. G. Johnson

Data Science Algorithms, Nonlinear Optimization admm, low rank, matrix completion, mixed-projection

 We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying matrix. This problem consists of an important generalization of the Matrix Completion problem, a central problem in Statistics, Operations Research and Machine Learning, that … Read more

Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls

Published: 2024/07/18, Updated: 2024/10/18

Aras Selvi

Data Science Theory, Robust Optimization

Adversarially robust optimization (ARO) has become the de facto standard for training models to defend against adversarial attacks during testing. However, despite their robustness, these models often suffer from severe overfitting. To mitigate this issue, several successful approaches have been proposed, including replacing the empirical distribution in training with: (i) a worst-case distribution within an … Read more

A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

Published: 2024/07/10, Updated: 2024/07/19

Serge Gratton

Sadok Jerad

Philippe L. Toint

Data Science Algorithms, Nonlinear Optimization adaptive regularization methods, evaluation complexity, nonconvex optimization, objective-function-free optimization, stochastic optimization

 A fully stochastic second-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal $\mathcal{O}(\epsilon^{-3/2})$ complexity bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps. Applications to cases where derivative … Read more

Why Line Search when you can Plane Search?

Published: 2024/06/25

Betty Shea

Network Optimization, Optimization in Data Science

We introduce the class of SO-friendly neural networks, which include several models used in practice including networks with 2 layers of hidden weights where the number of inputs islarger than the number of outputs. SO-friendly networks have the property that performing a precise line search to set the step size on each iteration has the … Read more