We consider stochastic optimization with side information where, prior to decision making, covariate data are available to inform better decisions. In particular, we propose to consider a distributionally robust formulation based on causal transport distance. Compared with divergence and Wasserstein metric, the causal transport distance is better at capturing the information structure revealed from the conditional distribution of random problem parameters given the covariate values. We derive a dual reformulation for evaluating the worst-case expected cost and show that the worst-case distribution in a causal transport distance ball has a similar conditional information structure as the nominal distribution. When optimizing over affine decision rules, we identify cases where the overall problem can be solved by convex programming. When optimizing over all (non-parametric) decision rules, we identify a new class of robust optimal decision rules when the cost function is convex with respect to a one-dimensional decision variable.