Exact Penalty Function for L21 Norm Minimization over the Stiefel Manifold

L21 norm minimization with orthogonality constraints, feasible region of which is called Stiefel manifold, has wide applications in statistics and data science. The state-of-the-art approaches adopt proximal gradient technique on either Stiefel manifold or its tangent spaces. The consequent subproblem does not have closed-form solution and hence requires an iterative procedure to solve which is … Read more

Characterization of an Anomalous Behavior of a Practical Smoothing Technique

A practical smoothing method was analyzed and tested against state-of-the-art solvers for some non-smooth optimization problems in [BSS20a; BSS20b]. This method can be used to smooth the value functions and solution mappings of fully parameterized convex problems under mild conditions. In general, the smoothing of the value function lies from above the true value function … Read more

Manifold Identification for Ultimately Communication-Efficient Distributed Optimization

This work proposes a progressive manifold identification approach for distributed optimization with sound theoretical justifications to greatly reduce both the rounds of communication and the bytes communicated per round for partly-smooth regularized problems such as the $\ell_1$- and group-LASSO-regularized ones. Our two-stage method first uses an inexact proximal quasi-Newton method to iteratively identify a sequence … Read more

Gradient Sampling Methods with Inexact Subproblem Solves and Gradient Aggregation

Gradient sampling (GS) has proved to be an effective methodology for the minimization of objective functions that may be nonconvex and/or nonsmooth. The most computationally expensive component of a contemporary GS method is the need to solve a convex quadratic subproblem in each iteration. In this paper, a strategy is proposed that allows the use … Read more

Stochastic Variance-Reduced Prox-Linear Algorithms for Nonconvex Composite Optimization

We consider minimization of composite functions of the form $f(g(x))+h(x)$, where $f$ and $h$ are convex functions (which can be nonsmooth) and $g$ is a smooth vector mapping. In addition, we assume that $g$ is the average of finite number of component mappings or the expectation over a family of random component mappings. We propose … Read more

Geometry of First-Order Methods and Adaptive Acceleration

First-order operator splitting methods are ubiquitous among many fields through science and engineering, such as inverse problems, signal/image processing, statistics, data science and machine learning, to name a few. In this paper, we study a geometric property of first-order methods when applying to solve non-smooth optimization problems. With the tool of “partial smoothness”, we design … Read more

A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable to both the primal and the dual ERM problem. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or … Read more

A simple Newton method for local nonsmooth optimization

Superlinear convergence has been an elusive goal for black-box nonsmooth optimization. Even in the convex case, the subgradient method is very slow, and while some cutting plane algorithms, including traditional bundle methods, are popular in practice, local convergence is still sluggish. Faster variants depend either on problem structure or on analyses that elide sequences of … Read more

Numerical solution of generalized minimax problems

This contribution contains the description and investigation of four numerical methods for solving generalized minimax problems, which consists in the minimization of functions which are compositions of special smooth convex functions with maxima of smooth functions (the most important problem of this type is the sum of maxima of smooth functions). Section~1 is introductory. In … Read more

Trust-region methods for the derivative-free optimization of nonsmooth black-box functions

In this paper we study the minimization of a nonsmooth black-box type function, without assuming any access to derivatives or generalized derivatives and without any knowledge about the analytical origin of the function nonsmoothness. Directional methods have been derived for such problems but to our knowledge no model-based method like a trust-region one has yet … Read more