Over the past few years, an international team of almost 200 psychologists has been trying to repeat a set of previously published experiments from its field, to see if it can get the same results. Despite its best efforts, the project, called Many Labs 2, has only succeeded in 14 out of 28 cases. Six years ago, that might have been shocking. Now it comes as expected (if still somewhat disturbing) news.Often unremarked is that even when the experiment does replicate, the effect size declines. Half the ideas we though to be true, aren't. And of those that remain true, they don't make as much difference as originally thought.
In recent years, it has become painfully clear that psychology is facing a “reproducibility crisis,” in which even famous, long-established phenomena—the stuff of textbooks and ted Talks—might not be real. There’s social priming, where subliminal exposures can influence our behavior. And ego depletion, the idea that we have a limited supply of willpower that can be exhausted. And the facial-feedback hypothesis, which simply says that smiling makes us feel happier.
One by one, researchers have tried to repeat the classic experiments behind these well-known effects—and failed. And whenever psychologists undertake large projects, like Many Labs 2, in which they replicate past experiments en masse, they typically succeed, on average, half of the time.
Ironically enough, it seems that one of the most reliable findings in psychology is that only half of psychological studies can be successfully repeated.
That failure rate is especially galling, says Simine Vazire from the University of California at Davis, because the Many Labs 2 teams tried to replicate studies that had made a big splash and been highly cited. Psychologists “should admit we haven’t been producing results that are as robust as we’d hoped, or as we’d been advertising them to be in the media or to policy makers,” she says. “That might risk undermining our credibility in the short run, but denying this problem in the face of such strong evidence will do more damage in the long run.”
Defenders of the realm have argued back that there is no replication crisis:
But skeptics have argued that the misleadingly named “crisis” has more mundane explanations. First, the replication attempts themselves might be too small. Second, the researchers involved might be incompetent, or lack the know-how to properly pull off the original experiments. Third, people vary, and two groups of scientists might end up with very different results if they do the same experiment on two different groups of volunteers.This new research sought to address the criticisms.
The Many Labs 2 project was specifically designed to address these criticisms. With 15,305 participants in total, the new experiments had, on average, 60 times as many volunteers as the studies they were attempting to replicate. The researchers involved worked with the scientists behind the original studies to vet and check every detail of the experiments beforehand. And they repeated those experiments many times over, with volunteers from 36 different countries, to see if the studies would replicate in some cultures and contexts but not others. “It’s been the biggest bear of a project,” says Brian Nosek from the Center for Open Science, who helped to coordinate it. “It’s 28 papers’ worth of stuff in one.”So not only are they finding replication failures regarding specific experiments, they are also showing that the replication failure is due to the poor quality of those original experiments and not due to the poor quality of the replication test.
Despite the large sample sizes and the blessings of the original teams, the team failed to replicate half of the studies it focused on. It couldn’t, for example, show that people subconsciously exposed to the concept of heat were more likely to believe in global warming, or that moral transgressions create a need for physical cleanliness in the style of Lady Macbeth, or that people who grow up with more siblings are more altruistic. And as in previous big projects, online bettors were surprisingly good at predicting beforehand which studies would ultimately replicate. Somehow, they could intuit which studies were reliable.Those last two lines might get lost in the verbiage but they are important. This is continuing evidence supporting Philip Tetlock's work which shows that on average, informed generalists are more accurate in their forecasts than narrow experts. Many policy arguments frequently revolve around an invocation of expertise ("Experts say. . . ", "The science indicates . . . " etc.), a classic example of the informal fallacy known as appeal to authority.
An appeal to scientific authority is used to trump an appeal to common sense despite the evidence that informed and broadly knowledgeable people frequently outperform those with only narrow expertise.
Another interesting finding was the parallelism between teams.
Likewise, Many Labs 2 “was explicitly designed to examine how much effects varied from place to place, from culture to culture,” says Katie Corker, the chair of the Society for the Improvement of Psychological Science. “And here’s the surprising result: The results do not show much variability at all.” If one of the participating teams successfully replicated a study, others did, too. If a study failed to replicate, it tended to fail everywhere.
No comments:
Post a Comment