We consider stochastic optimization with side information where, prior to decision-making, covariate data are available to inform better decisions. To hedge against data uncertainty while capturing the information structure revealed from the conditional distribution of random problem parameters given the covariate values, we propose a distributionally robust formulation based on causal transport distance. We derive a dual reformulation for evaluating the worst-case expected cost and show that the worst-case distribution in a causal transport distance ball preserves the conditional information structure from the nominal distribution. When optimizing over affine decision rules, we identify cases in which the overall problem can be solved via convex programming. When optimizing over all (non-parametric) decision rules, we identify a new class of robust optimal decision rules when the cost function is convex with respect to a one-dimensional decision variable.