We describe the fundamentals of algorithms for minimizing a smooth nonlinear function, and extensions of these methods to the sum of a smooth function and a convex nonsmooth function. Such objective functions are ubiquitous in data analysis applications, as we illustrate using several examples. We discuss methods that make use of gradient (first-order) information about the smooth part of the function, and also Newton methods that make use of Hessian (second-order) information. Convergence and complexity theory is outlined for each approach.
To appear in "Mathematics of Data," AMS / Park City Mathematics Institute Series, 2017.