Tuesday, January 12, 2021

To have enough statistical power to detect the typically expected sex effect a study would require 134 mice, but the average study the researchers looked at included only twenty-two mice

 From Science Fictions by Stuart Ritchie.  Page 136.  

As we saw in the last chapter, a p-value describes the chance that we’d find results that looked like ours (or even more impressive results) if in fact there was nothing going on, so we usually want it to be as low as possible (at least, lower than the standard threshold, normally set at 0.05). On the other hand, statistical power describes how likely we are to see a statistically significant signal when it really is there, so we want it to be as high as possible. Smaller effects – weaker signals – are far trickier to detect when you don’t have much data, so usually the more nuanced the effect you’re looking for, the bigger the sample that’s going to be required.
 
Here’s a more concrete way to think about it. In 2013 the psychologist Joseph Simmons and colleagues asked an online sample of participants to answer a set of questions about their preferences in areas such as food and politics, and also collected their basic demographics (gender, age, height, and so on). Simmons then split the sample into various groups (such as male versus female, or liberal versus conservative), and noted how much the groups differed on a selection of variables. From there, he worked out how many participants you would need to be confident that you could detect a given difference, if you didn’t already know it existed.  For instance, it turns out you could reliably establish our now-familiar link between height and sex – that men are taller than women on average – with just six men and six women from the survey; this effect, as we know, is large and therefore obvious (our twenty-person study from the previous chapter, therefore, had high statistical power). Another straightforward one: ‘Do the older people in the sample tend to say that they’re closer to retirement age?’ They do, and Simmons found that you would only need nine older and nine younger people to detect this. But here are some effects that would require larger numbers of participants to be detected:
  • People who like spicy food are more likely to like Indian food (twenty-six spice-likers and twenty-six spice-dislikers needed).  
  • Liberals tend to think social justice is more important than do conservatives (thirty-four of each political persuasion needed) 
  • Men weigh more than women on average (forty-six of each sex needed).

The point of the exercise was to get scientists to be realistic about the size of the effect they’re looking for in any given study and, therefore, the sample size they would need for their results to be meaningful. If your sample size wouldn’t be enough for a reliable test of ‘do men weigh more than women?’, it probably doesn’t have enough statistical power to detect the specific esoteric effect that’s implied by your theory.

Running a study with low statistical power is like setting out to look for distant galaxies with a pair of binoculars: even if what you’re looking for is definitely out there, you have essentially no chance of seeing it. Sadly, this point seems to have passed many scientists by, not least in Macleod’s chosen field of animal research. A 2013 review looked across a variety of neuroscientific studies, including, for example, research on sex differences in the ability of mice to navigate mazes.  To have enough statistical power to detect the typically expected sex effect in maze-navigating performance, a study would require 134 mice; it’s a much subtler effect, in other words, than ‘men weigh more than women’. But the average study the researchers looked at included only twenty-two mice. This isn’t specific to mice in mazes: it seems to be a problem across most types of neuroscience.  Large-scale reviews have also found that underpowered research is rife in medical trials, biomedical research more generally, economics, brain imaging, nursing research, behavioural ecology and – quelle surprise – psychology.

 

No comments:

Post a Comment