Bolstering Stochastic Gradient Descent with Model Building

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be … Read more

Regularized Step Directions in Conjugate Gradient Minimization for Machine Learning

Conjugate gradient minimization methods (CGM) and their accelerated variants are widely used in machine learning applications. We focus on the use of cubic regularization to improve the CGM direction independent of the steplength (learning rate) computation. Using Shanno’s reformulation of CGM as a memoryless BFGS method, we derive new formulas for the regularized step direction, … Read more

A quasi-Newton method with Wolfe line searches for multiobjective optimization

We propose a BFGS method with Wolfe line searches for unconstrained multiobjective optimization problems. The algorithm is well defined even for general nonconvex problems. Global convergence and R-linear convergence to a Pareto optimal point are established for strongly convex problems. In the local convergence analysis, if the objective functions are locally strongly convex with Lipschitz … Read more

An extended delayed weighted gradient algorithm for solving strongly convex optimization problems

The recently developed delayed weighted gradient method (DWGM) is competitive with the well-known conjugate gradient (CG) method for the minimization of strictly convex quadratic functions. As well as the CG method, DWGM has some key optimality and orthogonality properties that justify its practical performance. The main difference with the CG method is that, instead of … Read more

Two limited-memory optimization methods with minimum violation of the previous quasi-Newton equations

Limited-memory variable metric methods based on the well-known BFGS update are widely used for large scale optimization. The block version of the BFGS update, derived by Schnabel (1983), Hu and Storey (1991) and Vl·cek and Luk·san (2019), satis¯es the quasi-Newton equations with all used di®erence vectors and for quadratic objective functions gives the best improvement … Read more

Derivative-free separable quadratic modeling and cubic regularization for unconstrained optimization

We present a derivative-free separable quadratic modeling and cubic regularization technique for solving smooth unconstrained minimization problems. The derivative-free approach is mainly concerned with building a quadratic model that could be generated by numerical interpolation or using a minimum Frobenious norm approach, when the number of points available does not allow to build a complete … Read more

A study of Liu-Storey conjugate gradient methods for vector optimization

This work presents a study of Liu-Storey (LS) nonlinear conjugate gradient (CG) methods to solve vector optimization problems. Three variants of the LS-CG method originally designed to solve single-objective problems are extended to the vector setting. The first algorithm restricts the LS conjugate parameter to be nonnegative and use a sufficiently accurate line search satisfying … Read more

Adaptive Regularization Minimization Algorithms with Non-Smooth Norms

A regularization algorithm (AR1pGN) for unconstrained nonlinear minimization is considered, which uses a model consisting of a Taylor expansion of arbitrary degree and regularization term involving a possibly non smooth norm. It is shown that the non-smoothness of the norm does not affect the O(\epsilon_1^{-(p+1)/p}) upper bound on evaluation complexity for finding first-order \epsilon_1-approximate minimizers … Read more

Optimal Convergence Rates for the Proximal Bundle Method

We study convergence rates of the classic proximal bundle method for a variety of nonsmooth convex optimization problems. We show that, without any modification, this algorithm adapts to converge faster in the presence of smoothness or a Hölder growth condition. Our analysis reveals that with a constant stepsize, the bundle method is adaptive, yet it … Read more

New Bregman proximal type algorithms for solving DC optimization problems

Difference of Convex (DC) optimization problems have objective functions that are differences between two convex functions. Representative ways of solving these problems are the proximal DC algorithms, which require that the convex part of the objective function have L-smoothness. In this article, we propose the Bregman Proximal DC Algorithm (BPDCA) for solving large-scale DC optimization … Read more