Stephen T’s Blog Spot

A blog aimed at issues only data scientists, data analysts, statisticians, evaluators, and researchers care about.

There is a piece of advice that sounds responsible and is sometimes exactly wrong. When you worry that a comparison might be confounded, the reflex is to control for more variables, to put everything into the model just to be safe. But some variables do not remove bias when you control for them. They create it. They can manufacture a relationship that was never there.

The troublemaker has a name: a collider. A collider is a variable that is a common effect of two others, two arrows pointing into it. When two things both influence a third, something strange happens if you hold that third thing fixed, by adjusting for it, stratifying on it, or restricting your sample to it. The two causes become artificially correlated, even if they had nothing to do with each other. Conditioning on the common effect opens a path that was supposed to stay closed.

An example makes it concrete. Imagine a selective program that admits people who are strong on either a written test or an interview. Across all applicants, test and interview scores might be unrelated. But among those admitted, they will look negatively correlated: someone who got in with a mediocre test probably had a strong interview, and the reverse. The negative relationship is purely an artifact of selecting on admission, the collider. Nothing about the applicants changed. The sample did.

This is not a classroom curiosity. The oldest version is Berkson’s paradox, described in 1946: study only hospitalized patients and two unrelated conditions can appear linked, because each one raises the chance of being hospitalized. A modern version is the obesity paradox. In the general population, obesity raises mortality, yet among patients who already have heart disease, it can appear protective. Heart disease is a collider, a common effect of obesity and other risk factors, and conditioning on it distorts the picture. Adjust for the wrong variable and a harm can look like a benefit.

Here is what makes this genuinely hard. It is the mirror image of confounding. A confounder is a common cause of the treatment and the outcome, and you must adjust for it to get the right answer. A collider is a common effect, and you must not. The two can look identical in a dataset. The only way to tell them apart is to reason about the causal structure, which arrow points where, before you decide what goes into the model. The numbers alone will never tell you.

This is why the comforting rule, control for everything you can measure, is wrong. Some of those variables are colliders. Others are affected by the treatment itself, which makes them colliders too. Adding them all does not buy safety; it can introduce bias that was not there before. And, as in a recent post, more data does not save you: a larger sample just pins down the distorted association more precisely.

So the real move is conceptual, not statistical. Before you adjust for a variable, ask what causes it. If the treatment or the outcome, or the things that drive them, influence that variable, leave it out. Causal diagrams exist precisely to make these choices visible, marking which variables are confounders to control and which are colliders to avoid. The urge to control for more should give way to controlling for the right things.

In evaluation, this hides in plain sight. We restrict samples to program completers, or to those who responded, or to sites that reported data, and we load models with covariates to look thorough. Every restriction and every control can remove bias or create it. Selecting on completion, response, or survival is conditioning on a collider, the same trap in a different coat.

So here is my question: When you choose your control variables, do you reason about what causes what, or does the list come from whatever happens to be in the dataset?

Posted in

Leave a comment