Monday, October 2, 2023

The more confounding variables that you attempt to control for, the more you expand researcher degrees of freedom

From Fine, I'll run a regression analysis. But it won't make you happy. by Nate Silver.  The subheading is State partisanship and COVID vaccination rates are strongly predictive of COVID death rates even once you account for age.

One of my rules when I get in public debates as a statistician is: The simpler, the better. More or less, this is a version of Occam’s Razor. The more complications you introduce into an analysis, the more confounding variables that you attempt to control for, the more you expand researcher degrees of freedom — in other words, decision points by the analyst about how to run the numbers.

I don’t think it’s quite right to say these decisions are arbitrary. Ideally they’ll reflect a statistician’s judgment, experience and familiarity with the subject matter. Sometimes it’s absolutely necessary to control for confounders: otherwise you might wind up with ridiculous implications like that consuming ice cream causes drowning (summer weather is the obvious confounding variable). However, there are trade-offs when adding complications to your analysis. Although it’s possible to err in either direction, there’s a general tendency to overfit models.

My keep-it-simple attitude is also a stress response from years of experience arguing on the Internet. Any time you can make your point using simple counting statistics or other very straightforward methods, I consider that a win. People usually aren’t really interested in the intricacies beyond a certain point. Most of the time, what Scott Alexander calls “isolated demands for rigor” — there’s always some factor you haven’t accounted for — are just stepping stones on the road to confirmation bias.

So my aim is generally to focus on stylized facts that are true and robust. And to keep repeating them. I like simple (or simple-seeming) claims that — and I can’t emphasize this last part enough — I expect will hold up to scrutiny.

Related to Von Neumann's comment on curve fitting:

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.

Another individual becoming increasingly skeptical of the data integrity of university research - Nobel prize winner Daniel Kahneman.

In an interview, Dr. Kahneman, the Nobel Prize winner, suggested that while the efforts of scholars like the Data Colada bloggers had helped restore credibility to behavioral science, the field may be hard-pressed to recover entirely.

“When I see a surprising finding, my default is not to believe it,” he said of published papers. “Twelve years ago, my default was to believe anything that was surprising.”

No comments:

Post a Comment