Wednesday, November 1, 2017

The bane of small, non-random sample sizes

A couple of excellent examples of an issue I have a tendency to bang on about - the tendency in research, and especially social sciences research, to use too small sample sizes and non-random samples. The findings of a sociology/psychology experiment with 23 subjects from among self-selected upper income, westernized, elite university students interested in sociology/psychology tells you very, very little. If you want a measure of humans, not rich, western, high IQ, highly motivated humans, then you need sample sizes in the thousands or tens of thousands and the sample has to be truly random, not self-selected.

From How a Focus on Rich Educated People Skews Brain Studies by Ed Yong.
Neuroscience faces the same problems. When scientists use medical scanners to repeatedly peer at the shapes and activities of the human brain, those brains tend to belong to wealthy and well-educated people. And unless researchers take steps to correct for that bias, what we get is an understanding of the brain that’s incomplete, skewed, and, well, a little weird.

Kaja LeWinn, from the University of California, San Francisco, demonstrated this by reanalyzing data from a large study that scanned 1,162 children ages 3 to 18 to see how their brain changed as they grew up. The kids came from disproportionately wealthy and well-educated families, so LeWinn adjusted the data to see what it would look like if they had been more representative of the U.S. population. That's called “weighting,” and it’s a common strategy that epidemiologists use to deal with skews in their samples. As an easy example, if you ended up recruiting twice as many boys as girls, you’d assign the girls twice as much “weight” as the boys.

When LeWinn weighted her data for factors such as sex, ethnicity, and wealth, the results looked very different from the original set. The brain as a whole developed faster than previously thought, and some parts matured earlier relative to others.

[snip]

For example, in the study she reanalyzed, around 35 percent of the children had parents with college backgrounds, and around 38 percent had parents who earned more than $100,000 a year. If the sample had been truly representative of the U.S. population, those proportions would have been 11 percent and 26 percent, respectively. And weighting the data to account for these biases produced a different picture of brain development.

Brains get bigger as we get older, before shrinking again during later childhood. In the unweighted data, the brains hit their peak volume at 6 years on average, and their peak surface area at around 12 years. But in the weighted data, the brains hit those milestones 10 months and 29 months earlier, respectively. The pattern of development across the brain also changed. In the unweighted data, three of the brain’s four lobes hit their maximum area from the ages of 12 to 13, with only the parietal lobe peaking earlier at around 10 years. But the weighted data showed more of a wave of maturation, from the back of the brain to its front, and going from 9 years to 11.
Another example of how the failure of randomization and sample size changes the outcomes of the study:
Jim Coan, from the University of Virginia, learned the same lesson in his own work. A decade ago, he put 16 women in a brain scanner, promised to give them an electric shock, and looked at parts of their brains that respond to threats. He found that these areas are less active if the women held the hand of a stranger, even less active if they held their spouse’s hand, and less active still if they were in an especially happy relationship. “I had to raise $30,000 to do that experiment and everyone was white, wealthy, well educated. And yet, we thought: Here’s the story,” he says. “By yourself, you’re maximally responsible for meeting the demand of the threatening situation so you have more of a threat response. If you’re with your trusted romantic partner, you’re minimally responsive because you outsource.”

Years later, he got more money to do a bigger and more representative study of racially and socioeconomically diverse people drawn from the local community. “And the findings changed,” he says. The romantic partners still reduced the threat response, but a stranger’s hand had no effect at all. Why? Perhaps it’s because, as he showed in another study, the wealth of the neighborhood you grow up in affects the way your brain weighs up rewards and threats. “This shouldn’t surprise anybody,” he says. “The context in which you develop shapes the way your brain functions and probably the way it’s structured.”
Sample sizes and randomization is Statistics 101. The fact that we have many decades of assumptions, inferences and policies based on underpowered and non-random studies is a crying shame. Virtually every failed policy with unintended negative consequences can be traced to ill-founded conclusions based on naive studies.

Of course, the real driver is not so much incompetent researchers (though that is real) but rather cherry-picking advocates. The political process wants to go through the motions of rationality and will always find some anemic study to support whatever the vested interests/ideological needs might be. The demand comes from enthusiastic advocates and academics are the tepid suppliers. The way to break this corrupt cycle is for informed citizenry to push-back on the lazy arguments of advocates and demand real information before decisions are made.

UPDATE: A new study out yesterday: Is mindfulness research methodology improving over time? A systematic review by Samuel B. Goldberg, et al. Another example how motivated research on the cheap muddies the social discussion with cognitive pollution. From the abstract:
Background

Despite an exponential growth in research on mindfulness-based interventions, the body of scientific evidence supporting these treatments has been criticized for being of poor methodological quality.

Objectives

The current systematic review examined the extent to which mindfulness research demonstrated increased rigor over the past 16 years regarding six methodological features that have been highlighted as areas for improvement. These feature included using active control conditions, larger sample sizes, longer follow-up assessment, treatment fidelity assessment, and reporting of instructor training and intent-to-treat (ITT) analyses.

[snip]

Results

Across the 142 studies published between 2000 and 2016, there was no evidence for increases in any study quality indicator, although changes were generally in the direction of improved quality. When restricting the sample to those conducted in Europe and North America (continents with the longest history of scientific research in this area), an increase in reporting of ITT analyses was found. When excluding an early, high-quality study, improvements were seen in sample size, treatment fidelity assessment, and reporting of ITT analyses.
In 16 years, despite methodological shortcomings being known, there has been hardly any effort to actually improve the quality of studies. Much time, leadership effort, and organizational energy has been poured into the mindfulness fad with no real evidence that its claims are true.

No comments:

Post a Comment