Stephen T’s Blog Spot

A blog aimed at issues only data scientists, data analysts, statisticians, evaluators, and researchers care about.

In another post I argued that a giant sample drawn from a skewed frame is still wrong, and that size cannot save it. That raises the obvious question. If you are stuck with an imperfect sample, and most of us usually are, can statistics fix it after the fact? The honest answer is: often, and sometimes impressively, but only up to a hard limit worth understanding.

Start with the workhorse, weighting. Suppose your sample has twice as many young people as the population and half as many older ones. You assign each respondent a weight that pulls the sample back into line with known population totals, counting underrepresented groups for more and overrepresented groups for less. Anchored to solid benchmarks, usually census figures, this post-stratification corrects a sample that is lopsided on characteristics you can measure. When you know the population only one variable at a time, a technique called raking nudges the weights until they match every margin at once.

The more powerful modern approach is a mouthful, multilevel regression and poststratification, mercifully shortened to MRP. Instead of only reweighting, you build a model that predicts the outcome from a person’s characteristics, then use it to estimate the answer for every demographic cell, and finally reassemble those cells in the exact proportions of the real population. The multilevel part quietly stabilizes the thin cells, where you have few respondents, by borrowing strength from the rest of the data.

How well can this work? Consider the field’s most striking example. In 2012, researchers ran an opt-in election poll on the Xbox gaming platform. The sample was about as unrepresentative as you can imagine: roughly nine in ten respondents were men, and most were under thirty. The raw numbers were useless. Yet after MRP, the adjusted estimates tracked the professional poll aggregators and the actual result. A sample of gamers, properly modeled, measured the national electorate. MRP is now a standard tool, used for district-level forecasts and built into major media models. Even the doomed 1936 magazine poll could likely have been rescued, because it had quietly collected how people voted in the prior election, the kind of variable that makes adjustment work.

So statistics can carry a biased sample a long way. Now the catch, which is the whole point. These methods can only correct for things you can measure and match to the population, and they all rest on one assumption: that within each group you adjust for, the people you have are a fair stand-in for the people you are missing. Weight by age, sex, and region, and you are betting that a missing person resembles, on the outcome, the respondents of the same age, sex, and region. If the people your frame never reached differ in some way you did not measure, no weighting scheme can recover them. The information simply is not there.

There is a second, quieter cost. Heavy weighting throws away precision. When a handful of rare respondents must each stand for many, your effective sample shrinks and your uncertainty grows, even if the headline number looks fine. Extreme weights are a red flag, not a triumph.

For those working with administrative data, opt-in panels, and convenience samples, the practical bottom line is this. Weighting and MRP can turn an imperfect sample into a defensible estimate, but only with good population benchmarks and a credible case that what is missing is captured by what you measured. The method is part mathematics, part honesty about that assumption.

So here is my question: When you weight or model your way from a skewed sample to a population estimate, how do you decide whether the variables you adjusted for are enough, and how do you disclose who your numbers still cannot speak for?

Posted in

Leave a comment