Stephen T’s Blog Spot

A blog aimed at issues only data scientists, data analysts, statisticians, evaluators, and researchers care about.

The most famous cautionary tale in observational research goes like this. For years, study after study found that women on hormone replacement therapy had healthier hearts, and it was widely prescribed partly for that reason. Then a large randomized trial, the Women’s Health Initiative, found the opposite: the therapy raised cardiac risk. A great deal of practice had been built on a backward result. The reckoning that followed gave us the best discipline we have for analyzing observational data, and it is the repair to last time’s problem: target trial emulation.

When researchers went back to reconcile the two findings, the explanation was less mysterious than expected. The gap was largely driven by design choices, especially when women began therapy relative to menopause and how long they were followed, rather than by some hidden flaw in the data. The data was not the main villain. The analysis was. And that is the encouraging part, because analysis is something we control.

The deadliest observational errors are the ones you build in by accident. Immortal time bias is the classic: if you sort people into groups by something that happens later, like who eventually started a drug, you quietly credit that group with a window of time in which they could not possibly have had the outcome, because they had to survive long enough to start. Prevalent-user bias is studying the people already on a treatment, the survivors who tolerated it, rather than new starters. Misaligned time zero is starting the clock at a different moment for each group. None of these requires an unmeasured confounder. They are own-goals, and any one of them can reverse a result on its own.

Target trial emulation, developed by Miguel Hernan and James Robins, is built to prevent exactly these. The rule is simple and strict: before you touch the data, write the protocol of the randomized trial you wish you could run. Spell out who is eligible, the treatment strategies you are comparing, when follow-up begins, the outcome, the follow-up window, and the precise effect you mean to estimate. Then emulate each piece in your data. Enroll the eligible, classify treatment at the start as a new-user design, start everyone’s clock together, and adjust for the baseline factors you measured. Writing the protocol is what does the work, because a real trial would never permit immortal time, prevalent users, or a clock that starts whenever convenient, so emulating one drags those errors into the light before they take hold.

Be clear about what this does and does not buy you. Target trial emulation is not a randomizer. It cannot conjure away unmeasured confounding, the one problem only real random assignment solves. What it does is separate the two kinds of trouble. It eliminates the self-inflicted, design-induced biases outright, and leaves the genuine assumption, that you measured the things that matter, in the open where it can be stated, argued, and stress-tested with a sensitivity analysis. That clarity is the value. It turns ‘we ran some regressions on a database’ into ‘here is the trial we emulated, and here is the one assumption it rests on.’

So the practical rule is almost embarrassingly simple. Any time you are about to compare groups in administrative or real-world data, write the target-trial protocol first. Define time zero, use a new-user design, name the estimand before you run anything. It is the cheapest, most effective safeguard against fooling yourself, and it is fast becoming what regulators and careful reviewers expect to see.

So here is my question: When you analyze observational data, do you write down the trial you are emulating before you start, and has that discipline ever caught a self-inflicted bias before it reached your conclusions?

Posted in

Leave a comment