A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muon

A unified framework for first-order optimization algorithms for nonconvex unconstrained optimization is proposed that uses adaptively preconditioned gradients and includes popular methods such as full and diagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo and Muon. This framework also allows combining heterogeneous geometries across different groups of variables while preserving a unified convergence … Read more

An Interpretable Ensemble Heuristic for Principal-Agent Games with Machine Learning

This paper addresses the challenge of enhancing public policy decision-making by efficiently solving principal-agent models (PAMs) for public-private partnerships, a critical yet computationally demanding problem. We develop a fast, interpretable, and generalizable approach to support policy decisions under these settings. We propose an interpretable ensemble heuristic (EH) that integrates Machine Learning (ML), Operations Research (OR), … Read more

Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers

Artificial intelligence (AI) is moving increasingly beyond prediction to support decisions in complex, uncertain, and dynamic environments. This shift creates a natural intersection with operations research and management sciences (OR/MS), which have long offered conceptual and methodological foundations for sequential decision-making under uncertainty. At the same time, recent advances in deep learning, including feedforward neural … Read more

Bilevel Learning

Bilevel learning refers to machine learning problems that can be formulated as bilevel optimization models, where decisions are organized in a hierarchical structure. This paradigm has recently gained considerable attention in machine learning, as gradient-based algorithms built on the implicit function reformulation have enabled the computation of large-scale problems involving possibly millions of variables. Despite … Read more

Zeroth-Order Methods for Nonconvex-Strongly Concave Stochastic Minimax Problems with Decision-Dependent Distributions

Stochastic minimax problems with decision-dependent distributions (SMDD) have emerged as a crucial framework for modeling complex systems where data distributions drift in response to decision variables. Most existing methods for SMDD rely on an explicit functional relationship between the decision variables and the probability distribution. In this paper, we propose two sample-based zeroth-order algorithms, namely … Read more

Data-driven Policies For Two-stage Stochastic Linear Programs

A stochastic program typically involves several parameters, including deterministic first-stage parameters and stochastic second-stage elements that serve as input data. These programs are re-solved whenever any input parameter changes. However, in practical applications, quick decision-making is necessary, and solving a stochastic program from scratch for every change in input data can be computationally costly. This … Read more

A Projected Stochastic Gradient Method for Finite-Sum Problems with Linear Equality Constraints

A stochastic gradient method for finite-sum minimization subject to deterministic linear constraints is proposed and analyzed. The procedure presented adapts the projected gradient method on a convex set to the use of both a stochastic gradient and a possibly inexact projection map. Under standard assumptions in the field of stochastic gradient methods, we provide theoretical … Read more

Linear Model Extraction via Factual and Counterfactual Queries

In model extraction attacks, the goal is to reveal the parameters of a black-box machine learning model by querying the model for a selected set of data points. Due to an increasing demand for explanations, this may involve counterfactual queries besides the typically considered factual queries. In this work, we consider linear models and three … Read more

Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach

In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules that prescribe decisions using covariates. We first introduce the causal Sinkhorn discrepancy (CSD), an entropy-regularized causal Wasserstein distance that encourages continuous transport plans while … Read more

A Majorization-Minimization approach for multiclass classification in a big data scenario

This work presents a novel optimization approach for training linear classifiers in multiclass classification tasks, when focusing on a regularized and smooth Weston-Watkins support vector machine (SVM) model. We propose a Majorization-Minimization (MM) algorithm to solve the resulting, Lipschitz-differentiable, optimization problem. To enhance scalability of the algorithm when tackling large datasets, we introduce an incremental … Read more