A unified analysis of a class of proximal bundle methods for solving hybrid convex composite optimization problems

This paper presents a proximal bundle (PB) framework based on a generic bundle update scheme for solving the hybrid convex composite optimization (HCCO) problem and establishes a common iteration-complexity bound for any variant belonging to it. As a consequence, iteration-complexity bounds for three PB variants based on different bundle update schemes are obtained in the … Read more

Inexact bilevel stochastic gradient methods for constrained and unconstrained lower-level problems

Two-level stochastic optimization formulations have become instrumental in a number ofmachine learning contexts such as continual learning, neural architecture search, adversariallearning, and hyperparameter tuning. Practical stochastic bilevel optimization problemsbecome challenging in optimization or learning scenarios where the number of variables ishigh or there are constraints. In this paper, we introduce a bilevel stochastic gradient method … Read more

Regularized Step Directions in Conjugate Gradient Minimization for Machine Learning

Conjugate gradient minimization methods (CGM) and their accelerated variants are widely used in machine learning applications. We focus on the use of cubic regularization to improve the CGM direction independent of the steplength (learning rate) computation. Using Shanno’s reformulation of CGM as a memoryless BFGS method, we derive new formulas for the regularized step direction, … Read more

The equilateral small octagon of maximal width

A small polygon is a polygon of unit diameter. The maximal width of an equilateral small polygon with $n=2^s$ vertices is not known when $s \ge 3$. This paper solves the first open case and finds the optimal equilateral small octagon. Its width is approximately $3.24\%$ larger than the width of the regular octagon: $\cos(\pi/8)$. … Read more

Comparing Solution Paths of Sparse Quadratic Minimization with a Stieltjes Matrix

This paper studies several solution paths of sparse quadratic minimization problems as a function of the weighing parameter of the bi-objective of estimation loss versus solution sparsity. Three such paths are considered: the “L0-path” where the discontinuous L0-function provides the exact sparsity count; the “L1-path” where the L1-function provides a convex surrogate of sparsity count; … Read more

A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance

The Wasserstein distance has become increasingly important in machine learning and deep learning. Despite its popularity, the Wasserstein distance is hard to approximate because of the curse of dimensionality. A recently proposed approach to alleviate the curse of dimensionality is to project the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, … Read more

Projection Robust Wasserstein Barycenters

Collecting and aggregating information from several probability measures or histograms is a fundamental task in machine learning. One of the popular solution methods for this task is to compute the barycenter of the probability measures under the Wasserstein metric. However, approximating the Wasserstein barycenter is numerically challenging because of the curse of dimensionality. This paper … Read more

Dual descent ALM and ADMM

Classical primal-dual algorithms attempt to solve $\max_{\mu}\min_{x} \mathcal{L}(x,\mu)$ by alternatively minimizing over the primal variable $x$ through primal descent and maximizing the dual variable $\mu$ through dual ascent. However, when $\mathcal{L}(x,\mu)$ is highly nonconvex with complex constraints in $x$, the minimization over $x$ may not achieve global optimality, and hence the dual ascent step loses … Read more

Log-domain interior-point methods for convex quadratic programming

Applying an interior-point method to the central-path conditions is a widely used approach for solving quadratic programs. Reformulating these conditions in the log-domain is a natural variation on this approach that to our knowledge is previously unstudied. In this paper, we analyze log-domain interior-point methods and prove their polynomial-time convergence. We also prove that they … Read more

Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization

We consider unconstrained stochastic optimization problems with no available gradient information. Such problems arise in settings from derivative-free simulation optimization to reinforcement learning. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a stochastic function using finite differences within a common random number framework. We develop modified versions of a norm … Read more