Non-Convex Self-Concordant Functions: Practical Algorithms and Complexity Analysis

We extend the standard notion of self-concordance to non-convex optimization and develop a family of second-order algorithms with global convergence guarantees. In particular, two function classes – weakly self-concordant functions and F-based self-concordant functions – generalize the self-concordant framework beyond convexity, without assuming the Lipschitz continuity of the gradient or Hessian. For these function classes, … Read more

Iteration complexity of the Difference-of-Convex Algorithm for unconstrained optimization: a simple proof

We propose a simple proof of the worst-case iteration complexity for the Difference of Convex functions Algorithm (DCA) for unconstrained minimization, showing that the global rate of convergence of the norm of the objective function’s gradients at the iterates converge to zero like $o(1/k)$. A small example is also provided indicating that the rate cannot … Read more

A Majorization-Minimization approach for multiclass classification in a big data scenario

This work presents a novel optimization approach for training linear classifiers in multiclass classification tasks, when focusing on a regularized and smooth Weston-Watkins support vector machine (SVM) model. We propose a Majorization-Minimization (MM) algorithm to solve the resulting, Lipschitz-differentiable, optimization problem. To enhance scalability of the algorithm when tackling large datasets, we introduce an incremental … Read more

A speed up strategy for gradient methods

In this paper, we propose a new acceleration strategy for gradient-based methods applied to strictly convex Quadratic Programming (QP) problems. The strategy consists in performing, at selected iterations, minimization steps along alternative descent directions or even within low-dimensional affine subspaces. In particular, considering the contribution of the linear and quadratic part of the objective function … Read more

Primal-dual resampling for solution validation in convex stochastic programming

Suppose we wish to determine the quality of a candidate solution to a convex stochastic program in which the objective function is a statistical functional parameterized by the decision variable and known deterministic constraints may be present. Inspired by stopping criteria in primal-dual and interior-point methods, we develop cancellation theorems that characterize the convergence of … Read more

Facial reduction for nice (and non-nice) convex programs

Consider the primal problem of minimizing the sum of two closed proper convex functions \(f\) and \(g\). If the relative interiors of the domains of \(f\) and \(g\) intersect, then the primal problem and its corresponding Fenchel dual satisfy strong duality. When these relative interiors fail to intersect, pathologies and numerical difficulties may occur. In … Read more

An Elementary Proof of the Near Optimality of LogSumExp Smoothing

We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical smoothing, differing from the max function in value by at most $\ln(d)$. We provide an elementary construction of a lower bound, establishing that every overestimating smoothing of the max function must … Read more

Robust optimality for nonsmooth mathematical programs with equilibrium constraints under data uncertainty

We develop a unified framework for robust nonsmooth optimization problems with equilibrium constraints (UNMPEC). As a foundation, we study a robust nonsmooth nonlinear program with uncertainty in both the objective function and the inequality constraints (UNP). Using Clarke subdifferentials, we establish Karush–Kuhn–Tucker (KKT)–type necessary optimality conditions under an extended no–nonzero–abnormal–multiplier constraint qualification (ENNAMCQ). When the … Read more

New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes

In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of … Read more

Data-Dependent Complexity of First-Order Methods for Binary Classification

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification … Read more