Stephen T’s Blog Spot
A blog aimed at issues only data scientists, data analysts, statisticians, evaluators, and researchers care about.
Category: Uncategorized
-
There is a piece of advice that sounds responsible and is sometimes exactly wrong. When you worry that a comparison might be confounded, the reflex is to control for more variables, to put everything into the model just to be safe. But some variables do not remove bias when you control for them. They create…
-
A note before I begin. This post is a departure from my usual focus on research methods and evaluation. I am writing it for a simple reason: I find quantum computing genuinely fascinating, and the claims swirling around it and the social sciences have grown loud enough that I wanted to sort fact from fiction…
-
There is a comforting belief worth dismantling. When a dataset is huge, millions of records, we relax. Surely something that large is representative of the population. It is one of the most expensive intuitions in modern data work, because size and representativeness are not the same thing. A very large sample can be more confidently…
-
We pour enormous effort into getting the methods right: the design, the analysis, the careful caveats. Then the report is delivered, politely thanked, and set on a shelf, where nothing happens to it. In evaluation, the most common failure is not a flawed method. It is irrelevance. A technically flawless study that changes no decision…
-
Last time I argued that the heart of a credible causal claim is hunting for the explanations you would rather were not true. That is harder than it sounds, and the reason is the most human bias in all of research: we instinctively look for evidence that we are right, not evidence that we are…
-
You are asked the question every program eventually faces: what difference did it make? But there is no control group. You could not randomize. A dozen other things were happening at the same time, the economy, other programs, the simple passage of time. The counterfactual methods I have written about, matching, regression discontinuity, difference-in-differences, all…
-
Almost every federal program and grant proposal asks for a logic model or theory of change. And most of the time, one gets drawn, made to look tidy, approved, and never opened again. It becomes paperwork, a diagram that satisfied a reviewer. That is a waste, because used as intended, a theory of change is…
-
We are swimming in aggregate data. Averages by county, by ZIP code, by store, by team, by region. It is everywhere, it is cheap, and it is tempting to read it as if it described the people inside each group. That temptation has a name, the ecological fallacy, and it is one of the oldest…
-
We have been working through selection bias, and last time I covered propensity-score matching, which rebuilds comparable groups but only on the characteristics you managed to measure. There is a design that, where it applies, does something matching cannot: it balances even the things you did not measure. It is beloved in the methods literature,…
-
Last time I described selection bias: when people choose into a program, the participants and non-participants differ before anything happens, so a naive comparison measures the people, not the program. Randomization would solve it, but often we cannot randomize. So how do you rebuild a fair comparison from data where the groups were never balanced?…
-
Picture a voluntary job-training program. A year later, the people who enrolled are earning more than the people who did not. Success, right? Maybe. But ask a harder question first: who signs up for a training program? Often the more motivated, the more employable, the people already on their way up. If the participants were…
-
In a previous post I argued that missing data is rarely random, that deleting or averaging it away quietly biases your results, and that you usually cannot prove whether the gaps are the benign kind or the dangerous kind. That is a bleak place to stop, so here is the repair. It comes in two…
-
Every real dataset has holes. People skip the sensitive question, drop out of the study, or are never measured at all. A sensor fails, a record is incomplete, a field is blank. And most of the time, without quite deciding to, we handle those holes in one of two ways: we delete the rows that…
-
In another post I argued that a giant sample drawn from a skewed frame is still wrong, and that size cannot save it. That raises the obvious question. If you are stuck with an imperfect sample, and most of us usually are, can statistics fix it after the fact? The honest answer is: often, and…
-
In 1936, a magazine ran the largest election poll the world had ever seen. The Literary Digest mailed out around ten million ballots and tallied roughly two and a half million that came back. On that mountain of data, it confidently predicted that Alf Landon would defeat Franklin Roosevelt. Roosevelt then won in one of…
-
The word comes from navigation. A sailor cannot fix position from a single landmark; one bearing tells you the direction to the lighthouse, not where you are. Take a bearing on a second landmark, and a third, and the lines cross at one point. That crossing is your location. No single sighting could have given…
-
There is an old joke about a man searching for his lost keys under a streetlight. A passerby asks where he dropped them. “Over in the park,” he says, “but the light is better here.” We laugh, and then we go build our performance dashboards the same way. This is the streetlight effect, and its…
-
In another post I described difference-in-differences, the design that estimates a policy’s effect by comparing the change in a treated group to the change in an untreated one. Today I want to build on it, because there is a more powerful relative of that design, one I have leaned on often: the controlled interrupted time…
-
In a separate blog, I wrote about synthetic controls, a clever way to build a comparison group when you do not have one. Today, its simpler and far more common cousin, a workhorse of policy evaluation: difference-in-differences. Start with the problem it solves. A policy changes in one place and not another. One state raises…
-
Last time I argued that mixed methods is really about integration. There is one word that often stands in for that integration and quietly does a great deal of unearned work: triangulation. People write ‘we triangulated’ as though it settles the question of whether a finding can be trusted. By itself, it does not. Start…