We live in the age of evidence-based everything: evidence-based policy, evidence-based practice, evidence-based design. When someone says “the research shows,” it is meant to end the argument. But a quiet problem is buried in that phrase. The research you get to see is not a fair sample of the research that was actually done.
The problem has a memorable name, coined by the psychologist Robert Rosenthal in 1979: the file drawer problem. Studies that find a strong, statistically significant effect tend to get written up, submitted, and published. Studies that find nothing, a null result, tend to get quietly filed away and forgotten. The finished research that reaches you has passed through a filter, and the filter favors exciting results over boring ones.
How big is the filter? In a striking 2014 study in Science, researchers tracked a known population of social science experiments, so they knew which ones were run and what each found. Among studies with strong results, most were published. Among those that came up null, only about one in five made it into a journal. And the biggest reason was not that journals rejected the nulls. It was that the authors never even wrote them up. The dead ends went straight into the drawer.
The consequence is visible in the literature itself. Across fields, the overwhelming majority of published studies, often well over ninety percent, report a statistically significant finding. That cannot be what real research looks like, because real research is full of dead ends and things that do not pan out. What we are seeing is not the truth about how often effects are real. We are seeing the truth about what gets published.
This matters for anyone who relies on a body of evidence. Run a literature review or meta-analysis using only published studies, and you are averaging a sample stripped of its failures. The effect you calculate comes out too big, sometimes dramatically, because the studies that found nothing sit invisibly in a thousand drawers. It is a main reason impressive findings shrink or vanish when someone tries to replicate them.
This is the companion to a problem I have written about before. P-hacking manufactures false positives inside one study; publication bias decides which finished studies you ever see. One distorts the result, the other the whole library. Together they make a field look more certain than it is.
The fixes are mostly structural and gaining ground. Study registries record what researchers plan before they do it, revealing which studies were started and never reported. Registered reports, where a journal commits to publish based on the design before results are known, break the link between significance and publication. And in reviews, tools like funnel plots try to detect when the small, null studies are missing. The deeper habit is simpler: treat the absence of published failures as a question, not a reassurance.
For those of us who build evidence reviews or cite best practices to clients, the discipline is to ask what is not on the page. A practice that looks well supported may simply be one whose failures were never written down. The most important studies in a review are sometimes the ones you cannot find.
So here is my question: When you assess a body of evidence, how do you account for the studies that were run but never published, and has the missing research ever changed your read of what the evidence really says?

Leave a comment