A vast number of causal inference studies use matching techniques, where treatment cases are matched with similar control cases. For observational data in particular, we claim there is a major source of uncertainty that is essentially ignored in these tests, which is the way the assignments of matched pairs are constructed. It is entirely possible, for instance, that a study reporting an estimated treatment effect with P-value of 0.0001 can be redone in almost the same way, with the same match quality, yet with a P-value well above 0.10, making the test result no longer significant. Experimenters often specifically choose not to consider the output in the design of the assignments; this allows for easier computation and clearer testing, but it does not consider possible biases in the way the assignments were constructed. What we would really like to be able to report is that no matter which assignment we choose, as long as the match is sufficiently good, then the hypothesis test result still holds. This will be called a robust matched pairs test, since its result is robust to the choice of the assignment. In this paper, we provide methodology based on discrete optimization to create these robust tests. This method explores the full variation of possible results one can obtain with all possible acceptable assignments. It demonstrates that one cannot always trust statistically significant results, even with a large number of matched pairs.
Noor-E-Alam, M. and Rudin, C., "Robust Testing for Causal Inference in Observational Studies", working paper.