Many real-world analytics problems involve two significant challenges: prediction and optimization. Due to the typically complex nature of each challenge, the standard paradigm is to predict, then optimize. By and large, machine learning tools are intended to minimize prediction error and do not account for how the predictions will be used in a downstream optimization problem. In contrast, we propose a new and very general framework, called Smart "Predict, then Optimize" (SPO), which directly leverages the optimization problem structure, i.e., its objective and constraints, for designing successful analytics tools. A key component of our framework is the SPO loss function, which measures the quality of a prediction by comparing the objective values of the solutions generated using the predicted and observed parameters, respectively. Training a model with respect to the SPO loss is computationally challenging, and therefore we also develop a surrogate loss function, called the SPO+ loss, which upper bounds the SPO loss, has desirable convexity properties, and is statistically consistent under mild conditions. We also propose a stochastic gradient descent algorithm which allows for situations in which the number of training samples is large, model regularization is desired, and/or the optimization problem of interest is nonlinear or integer. Finally, we perform computational experiments to empirically verify the success of our SPO framework in comparison to the standard predict-then-optimize approach.