Friday, March 24, 2023

Observational studies must address all four quadrants of the Rumsfeld matrix

I not infrequently disagree with Emily Oster, usually not because I believe she is necessarily wrong but because she often seems to lack confidence in her empirical approach.  I endorse the scientific method, while recognizing that it is more challenging than is often acknowledged.  We should be bold in our confidence in the scientific method even while acknowledging that the world is complex and does not easily let go of its secrets and mysteries.  

Through the Covid pandemic, there were many public health issues where the federal approach was either explicitly wrong based on past experience or at best on dubious grounds.  I would have wished Oster to have been more forthright and earlier in her criticism because the science warranted the criticism.  But she was generally mooted and conformed.  

So it is nice to find a piece by her with which I whole-heartedly endorse.  In this instance, the care that needs to be taken when concluding anything from observational studies.  They can be dispositive, clarifying and suggestive.  They are rarely sufficient to reach strong conclusions.

A question I get frequently: Why does my analysis often disagree with groups like the American Academy of Pediatrics or other national bodies, or other public health experts, or Andrew Huberman (lately I get that last one a lot)? The particular context is often in observational studies of topics in nutrition or development.

[snip]

The questioner essentially notes: the reason we know that the processed food groups differ a lot is that the authors can see the characteristics of individuals. But because they see these characteristics, they can adjust for them (using statistical tools). While it’s true that education levels are higher among those who eat less processed food, by adjusting for education we can come closer to comparing people with the same education level who eat different kinds of food.

However, in typical data you cannot observe and adjust for all differences. You do not see everything about people. Sometimes this is simply because our variables are rough: we see whether someone has a family income above or below the poverty line, but not any more details, and those details are important. There are also characteristics we almost never capture in data, like How much do you like exercise? or How healthy are your partner’s behaviors? or even Where is the closest farmers’ market? 

For both of these reasons, in nearly all examples, we worry about residual confounding. That’s the concern that there are still other important differences across groups that might drive the results. Most papers list this possibility in their “limitations” section. 

We all agree that this is a concern. Where we differ is in how much of a limitation we believe it to be. In my view, in these contexts (and in many others), residual confounding is so significant a factor that it is hopeless to try to learn causality from this type of observational data. 

Indeed.  And this is not inconsequential.  Whenever there is a disparate impact study, it is intended by design to control for confounding variables so that any variance in outcomes can be attributed to discrimination.  For decades it was both a matter of ideological faith that women were discriminated against in the market economy because they were women.  Infamously known as the Gender Pay Gap.

And for a decade or two now, we have known that the Gender Pay Gap is entirely an artifact of confounding variables.  Once you control for education achievement, hours worked, number of absences and duration of absences from the marketplace, field of endeavor, etc., there is no gap.  As economic theory would suggest, people, regardless of sex, are paid the same for the same type of work with the same experience.  

Think about the tens of thousands of hours of legislative and policy debate and the thousands of laws and regulations which have been passed or enacted to banish a problem that doesn't actually exist.  There is no Gender Pay Gap except in badly or inadequately designed observational studies in the context of an ideological conviction that any disparate impact must be attributable to conscious or unconscious discrimination.

What a waste of time.

Oster continues:

Conceptually, the gold standard for causality is a randomized controlled trial. In the canonical version of such a trial, researchers randomly allocate half of their participants to treatment and half to control. They then follow them over time and compare outcomes. The key is that because you randomly choose who is in the treatment group, you expect them, on average, to be the same as the control other than the presence of the treatment. So you can get a causal effect of treatment by comparing the groups.

Randomized trials are great but not always possible. A lot of what is done in public health and economics aims to estimate causal effects without randomized trials. The key to doing this is to isolate a source of randomness in some treatment, even if that randomization is not explicit.

[snip]

We can take this lens to the kind of observational data that we often consider. Let’s return to the processed food and cancer example. The approach in that paper was to compare people who ate a lot of processed food with those who ate less. Clearly, in raw terms, this would be unacceptable because there are huge differences across those groups. The authors argue, though, that once they control for those differences, they have mostly addressed this issue.

This argument comes down to: once I control for the variables I see, the choice about processed food is effectively random, or at least unrelated to other aspects of health.

I find this fundamentally unpalatable. Take two people who have the same level of income, the same education, and the same preexisting conditions, and one of them eats a lot of processed food and the other eats a lot of whole grains and fresh vegetables. I contend that those people are still different. That their choice of food isn’t effectively random — it’s related to other things about them, things we cannot see. Adding more and more controls doesn’t necessarily make this problem better. You’re isolating smaller and smaller groups, but still you have to ask why people are making different food choices.

Food is a huge part of our lives, and our choices about it are not especially random. Sure, it may be random whether I have a sandwich or a salad for lunch today, but whether I’m eating a bag of Cheetos or a tomato and avocado on whole-grain toast — that is simply not random and not unrelated to other health choices.

This is where, perhaps, I conceptually differ from others. I have to imagine that researchers doing this work do not hold this view. It must be that they think that once we adjust for the observed controls, the differences across people are random, or at least are unrelated to other elements of their health.    

She provides some good detail on illustrating examples.  She gets to the core issues:

The control sets we typically consider are incomplete. There are a lot of papers that report effectively only the first two bars in the graph above. But those simple observable controls are just not sufficient. The residual confounding is real and it is significant. 

It is all well and good to control for the known known confounding variables.  But we still have the unknown known, the unknown unknown, and the unknown unknown confounding variables in the Rumsfeld Matrix.

Oster is effectively pointing out that virtually all observational studies only at best address one of the four Rumsfeldian quadrants of epistemic uncertainty, the known knowns.  All the rest are terra incognita, rendering  observational studies of little usefulness.  

The question of whether a controlled effect in observational data is “causal” is inherently unanswerable. We are worried about differences between people that we cannot observe in the data. We can’t see them, so we must speculate about whether they are there. Based on a couple of decades of working intensely on these questions in both my research and my popular writing, I think they are almost always there. I think they are almost always important, and that a huge share of the correlations we see in observational data are not close to causal.





No comments:

Post a Comment