In a separate blog, I wrote about synthetic controls, a clever way to build a comparison group when you do not have one. Today, its simpler and far more common cousin, a workhorse of policy evaluation: difference-in-differences.
Start with the problem it solves. A policy changes in one place and not another. One state raises a tax, passes a law, or launches a program; a neighbor does not. You want to know the effect. The naive approach is to look at the treated place before and after. But a simple before-and-after cannot work, because the world does not hold still. The economy shifts, the seasons turn, new trends arrive. Any change you see is a tangle of the policy and everything else that happened in the meantime.
Difference-in-differences cuts through the tangle with a second difference. You track the treated group before and after, and you track an untreated comparison group over the same period. The comparison group is also exposed to the economy, the seasons, and the trends, just not to the policy. So whatever happened to the comparison group is your estimate of what would have happened to the treated group anyway. Subtract that out, and what remains is the effect of the policy. The first difference is before versus after. The second difference is treated versus comparison. Hence the name.
The famous example is a 1994 study by David Card and Alan Krueger. In 1992, New Jersey raised its minimum wage while neighboring Pennsylvania did not. The researchers surveyed fast-food restaurants in both states before and after. Employment in Pennsylvania drifted down over those months; had New Jersey simply followed, its employment should have drifted down too. Instead it held roughly steady. The difference between the two differences was the estimated effect, and it ran against the textbook prediction that a higher minimum wage cuts jobs. The result was influential and hotly debated for years, a lesson in itself about how a single study lands.
The whole method rests on one assumption, and it is worth saying plainly because it is so easy to skip past. It is called parallel trends: the idea that, absent the policy, the treated and comparison groups would have moved in parallel. You can never prove this, because you cannot observe the world where the policy did not happen. What you can do is check whether the two groups moved together before the intervention. If they tracked each other for years beforehand, the assumption is more believable. If they were already diverging, difference-in-differences will hand you a confident, wrong answer.
And there are traps even when the trends look parallel. The comparison group must be genuinely unaffected; if New Jersey’s hike pulled workers across the border, Pennsylvania is contaminated. Other changes sneak in too: New Jersey also cut its sales tax that same year, so part of any effect could belong to the tax, not the wage. A clean difference-in-differences needs a comparison that is truly comparable, truly untouched, and not subject to some other shock at the same moment.
For those of us evaluating policies and programs, this is one of the most useful tools there is, precisely because the real world keeps handing us situations where one group changed and a similar one did not. But it is only as good as its comparison and its assumption. The discipline is to show the pre-period trends, argue honestly for parallel trends rather than assert it, and hunt for the other things that changed at the same time.
So here is my question: When you have used a difference-in-differences design, how did you defend the parallel trends assumption, and what convinced you the comparison group was really a fair stand-in?
Leave a comment