In a previous post I argued that missing data is rarely random, that deleting or averaging it away quietly biases your results, and that you usually cannot prove whether the gaps are the benign kind or the dangerous kind. That is a bleak place to stop, so here is the repair. It comes in two moves: fill the holes honestly, then test how much your conclusion depends on the guesses.
The first move is multiple imputation, an idea from the statistician Donald Rubin. The instinct most people have is to fill each blank with a single best guess, a predicted value or an average. The trouble is that a single guess pretends you know something you do not, and your software then treats that invented number as if it were real data, reporting a precision you have not earned. Multiple imputation refuses the pretense. Instead of filling each gap once, it fills it many times, drawing a range of plausible values from a model built on everything else you observed. You end up with several complete datasets that agree on what you measured and differ exactly where you were guessing.
You then analyze each completed dataset normally and pool the results using what are called Rubin’s rules. The pooled estimate is just the average across the datasets. The clever part is the uncertainty. The pooled variance combines two things: the ordinary uncertainty within each dataset, and the variation between the datasets, the degree to which your answer wobbled as the guesses changed. That second piece is the honesty. It is the cost of not having observed the data, written directly into your confidence interval, where a single imputation would have hidden it.
Multiple imputation is valid under the safe and the moderate cases, MCAR and MAR, and you can strengthen it by including auxiliary variables, things that help predict either the missing value or the fact that it is missing. Adding them makes the at-random assumption more plausible and shrinks the bias if it is not quite true.
But recall the catch from last time: you cannot prove the data are missing at random. Multiple imputation assumes it. So a careful analysis does not stop there. It asks the second question: what if the assumption is wrong, and the missing values are systematically different from what the model expects?
That is the job of sensitivity analysis, and the most intuitive version is the tipping point. You deliberately push the imputed values for the missing cases away from the at-random guess, making the dropouts worse off, say, by a little, then more. At each step you rerun the analysis. If your result survives even large, pessimistic departures, it is robust. If a tiny nudge flips it from significant to null, it was fragile all along, resting on an assumption you could never test. Reporting that tipping point, how far reality would have to depart from at-random to overturn your finding, is among the most honest things an analyst can do. It is now routine in well-run clinical trials, precisely because lives ride on the answer.
Put the two moves together and you have a workflow with integrity. Impute many times to use all the information you have and to carry the uncertainty you cannot escape. Then stress-test the conclusion against the possibility that what you could not see was different from what you could. The goal is never to pretend the holes were not there. It is to show, out loud, exactly how much they could matter.
So here is my question: When you handle missing data, do you go past imputation to test how fragile your conclusion is, and have you ever found that the answer hinged entirely on the cases you could not observe?
Leave a comment