Better than Standard Errors | Brian Quistorff: 04/01/2022

Jonathan Roth has a recent paper Pre-test with Caution: Event-study Estimates After Testing for Parallel Trends, dealing with event study designs that rely on the "parallel trends" assumption (i.e., in the absence of treatment, the treated units would have moved parallel to the control units). He makes the case that (1) many tests for parallel trends in the pre-treatment period in the literature have low power, and (2) there is often correlations between the coefficients in the pre-period (where you'd test for parallel trends) and the post-period (where you do treatment effect estimation). Selecting to move forward with analysis only when you pass tests in the former can then cause bias in the latter. This is a subtle case of the common problem of data-dependent decision making that affects lots of modelling tasks.

In a follow-up paper, Roth and Ashesh Rambachan advocate that the analyst should look at the range of possible deviations from parallel pre-treatment trends to form bounds on the future treatment effect estimates. This is a very sensible suggestion, though, it can be a hard case to make if people think bounds will be too wide and few other people use them.

From my own work on machine learning, my natural reaction to data-dependent analysis is to think about sample splitting. The standard idea of splitting treated and control units into "test assumptions" and "effect estimation" subsamples would work, but would make worse the problem of power. Another way is possible when you see how separate time-coefficients are correlated. In a model with i.i.d data, this is because each time-coefficient is an offset from a base period, so that the data from the base period is part of all temporal coefficients. Correlations can be alleviated if we then separate the samples in time. One would use all but one period of the pre-treatment period to test for pre-trend differences (resulting in a small difference is power) and use the other periods for effect estimation. Now the two sets of estimations use different base periods and so are no-longer correlated. There is still a problem if the there is serial correlation in the outcome after accounting for controls (e.g., serial correlation in the error), though this will be smaller than in the naive analysis because you are increasing the gap between the two samples.

The longer version of the pre-trends paper actually has fancier modifications (Appendix B) that tries to reduce the bias even without sample splitting. Maybe this will have some take-up.

In the end, Roth's papers forces one to remember that, fundamentally, parallel trends is an untestable assumption. There happens to often be some related data that might be helpful, but we should still think careful about it like we would other assumptions (e.g., exclusion restrictions).

Better than Standard Errors | Brian Quistorff

Friday, April 22, 2022

Alleviating bias from testing pre-trends in event study designs