Data-driven distributionally robust optimization is a recently emerging paradigm aimed at finding a solution that is driven by sample data but is protected against sampling errors. An increasingly popular approach, known as Wasserstein distributionally robust optimization (DRO), achieves this by applying the Wasserstein metric to construct a ball centred at the empirical distribution and finding a solution that performs well against the most adversarial distribution from the ball. In this paper, we present a general framework for studying different choices of a Wasserstein metric and point out the limitation of the existing choices. In particular, while choosing a Wasserstein metric of a higher order is desirable from a data-driven perspective, given its less conservative nature, such a choice comes with a high price from a robustness perspective - it is no longer applicable to many heavy-tailed distributions of practical concern. We show that this seemingly inevitable trade-off can be resolved by our framework, where a new class of Wasserstein metrics, called coherent Wasserstein metrics, is introduced. Like Wasserstein DRO, distributionally robust optimization using the coherent Wasserstein metrics, termed generalized Wasserstein distributionally robust optimization (GW-DRO), has all the desirable performance guarantees: finite-sample guarantee, asymptotic consistency, and computational tractability. The worst-case expectation problem in GW-DRO is in general a nonconvex optimization problem, yet we provide new analysis to prove its tractability without relying on the common duality scheme. Our framework, as shown in this paper, offers a fruitful opportunity to design novel Wasserstein DRO models that can be applied in various contexts such as operations management, finance, and machine learning.