Stephen T’s Blog Spot

A blog aimed at issues only data scientists, data analysts, statisticians, evaluators, and researchers care about.

about

I spend my time thinking about how the government can use research and evaluation methods to make evidence informed decisions and policy. It can be a struggle! That is why I need your thoughts. Join me, have a read, and make a comment. Its how we add to the evidence base,

Category: Uncategorized

The Variable You Should Not Control For

June 27, 2026

There is a piece of advice that sounds responsible and is sometimes exactly wrong. When you worry that a comparison might be confounded, the reflex is to control for more variables, to put everything into the model just to be safe. But some variables do not remove bias when you control for them. They create…
Quantum Computing and Social Science: Fact and Fiction

June 26, 2026

A note before I begin. This post is a departure from my usual focus on research methods and evaluation. I am writing it for a simple reason: I find quantum computing genuinely fascinating, and the claims swirling around it and the social sciences have grown loud enough that I wanted to sort fact from fiction…
A Bigger Sample Is Not a Better One

June 26, 2026

There is a comforting belief worth dismantling. When a dataset is huge, millions of records, we relax. Surely something that large is representative of the population. It is one of the most expensive intuitions in modern data work, because size and representativeness are not the same thing. A very large sample can be more confidently…
Who Is This Evaluation Actually For? Utilization-Focused Evaluation

June 26, 2026

We pour enormous effort into getting the methods right: the design, the analysis, the careful caveats. Then the report is delivered, politely thanked, and set on a shelf, where nothing happens to it. In evaluation, the most common failure is not a flawed method. It is irrelevance. A technically flawless study that changes no decision…
The Bias That Feels Like Being Right: Confirmation Bias

June 26, 2026

Last time I argued that the heart of a credible causal claim is hunting for the explanations you would rather were not true. That is harder than it sounds, and the reason is the most human bias in all of research: we instinctively look for evidence that we are right, not evidence that we are…
Making the Case Without a Control Group: Contribution Analysis

June 26, 2026

You are asked the question every program eventually faces: what difference did it make? But there is no control group. You could not randomize. A dozen other things were happening at the same time, the economy, other programs, the simple passage of time. The counterfactual methods I have written about, matching, regression discontinuity, difference-in-differences, all…
Bad Idea, or Badly Delivered? The Theory of Change and Logic Models

June 26, 2026

Almost every federal program and grant proposal asks for a logic model or theory of change. And most of the time, one gets drawn, made to look tidy, approved, and never opened again. It becomes paperwork, a diagram that satisfied a reviewer. That is a waste, because used as intended, a theory of change is…
Groups Are Not Big People: The Ecological Fallacy

June 26, 2026

We are swimming in aggregate data. Averages by county, by ZIP code, by store, by team, by region. It is everywhere, it is cheap, and it is tempting to read it as if it described the people inside each group. That temptation has a name, the ecological fallacy, and it is one of the oldest…
Just Above the Line, Just Below It: The Regression Discontinuity Design

June 26, 2026

We have been working through selection bias, and last time I covered propensity-score matching, which rebuilds comparable groups but only on the characteristics you managed to measure. There is a design that, where it applies, does something matching cannot: it balances even the things you did not measure. It is beloved in the methods literature,…
Matching on a Single Number: Propensity-Score Matching

June 26, 2026

Last time I described selection bias: when people choose into a program, the participants and non-participants differ before anything happens, so a naive comparison measures the people, not the program. Randomization would solve it, but often we cannot randomize. So how do you rebuild a fair comparison from data where the groups were never balanced?…
Did the Program Work, or Did the Right People Join?

June 26, 2026

Picture a voluntary job-training program. A year later, the people who enrolled are earning more than the people who did not. Success, right? Maybe. But ask a harder question first: who signs up for a training program? Often the more motivated, the more employable, the people already on their way up. If the participants were…
Fill the Holes, Then Stress-Test Them: Multiple Imputation and Sensitivity Analysis

June 26, 2026

In a previous post I argued that missing data is rarely random, that deleting or averaging it away quietly biases your results, and that you usually cannot prove whether the gaps are the benign kind or the dangerous kind. That is a bleak place to stop, so here is the repair. It comes in two…
The Holes in Your Data Are Not Random: Missing Data Mechanisms (MCAR, MAR, MNAR)

June 26, 2026

Every real dataset has holes. People skip the sensitive question, drop out of the study, or are never measured at all. A sensor fails, a record is incomplete, a field is blank. And most of the time, without quite deciding to, we handle those holes in one of two ways: we delete the rows that…
Can Statistics Rescue a Biased Sample? Survey Weighting and MRP

June 25, 2026

In another post I argued that a giant sample drawn from a skewed frame is still wrong, and that size cannot save it. That raises the obvious question. If you are stuck with an imperfect sample, and most of us usually are, can statistics fix it after the fact? The honest answer is: often, and…
A Big Sample of the Wrong People: Coverage Error and the Sampling Frame

June 25, 2026

In 1936, a magazine ran the largest election poll the world had ever seen. The Literary Digest mailed out around ten million ballots and tallied roughly two and a half million that came back. On that mountain of data, it confidently predicted that Alf Landon would defeat Franklin Roosevelt. Roosevelt then won in one of…
Three Bearings to a Single Point: Triangulation

June 25, 2026

The word comes from navigation. A sailor cannot fix position from a single landmark; one bearing tells you the direction to the lighthouse, not where you are. Take a bearing on a second landmark, and a third, and the lines cross at one point. That crossing is your location. No single sighting could have given…
Looking Where the Light Is Good: The Streetlight effect and the McNamara Fallacy

June 25, 2026

There is an old joke about a man searching for his lost keys under a streetlight. A passerby asks where he dropped them. “Over in the park,” he says, “but the light is better here.” We laugh, and then we go build our performance dashboards the same way. This is the streetlight effect, and its…
From Two Points to Two Trends: Controlled Interrupted Time Series

June 25, 2026

In another post I described difference-in-differences, the design that estimates a policy’s effect by comparing the change in a treated group to the change in an untreated one. Today I want to build on it, because there is a more powerful relative of that design, one I have leaned on often: the controlled interrupted time…
When One Place Changes and Another Does not: Difference-in-Differences

June 25, 2026

In a separate blog, I wrote about synthetic controls, a clever way to build a comparison group when you do not have one. Today, its simpler and far more common cousin, a workhorse of policy evaluation: difference-in-differences. Start with the problem it solves. A policy changes in one place and not another. One state raises…
Triangulation Is Not a Validity Stamp

June 25, 2026

Last time I argued that mixed methods is really about integration. There is one word that often stands in for that integration and quietly does a great deal of unearned work: triangulation. People write ‘we triangulated’ as though it settles the question of whether a finding can be trusted. By itself, it does not. Start…

recent posts

about

Category: Uncategorized