From Science Fictions by Stuart Ritchie. Page 140.
When I was an undergraduate student, between 2005 and 2009, candidate gene studies were the subject of intense and excited discussion. By the time I got my PhD in early 2014, they were almost entirely discredited. What happened? The main factor was that technology improved, making it possible to measure people’s genotypes much more cheaply. Consequently, we could use much larger samples in genetic studies, with sample sizes of many thousands, or tens of thousands, now in reach. Geneticists also started taking a different approach: instead of looking at just one or a handful of candidate genes, they looked simultaneously across many thousands of points on the DNA that vary between people, checking which of them were most strongly related to the traits in question. This approach is called a genome-wide association study (or GWAS), and the analyses in these studies had far better statistical power – so they could find genetic variants that had much smaller effects on the traits, in addition to the large effects of the well-established candidate genes.
Except the GWASs didn’t find the large effects of those candidate genes. They were nowhere to be seen, whereas if they were real, “they’d have stuck out like the sorest of thumbs. Instead, the conclusion of the new, high-power GWASs was that, with a few very rare exceptions, complex human traits are generally related to many thousands of genetic variants, each of which appears to contribute only a minuscule effect. There was no space for large effects of single genes, which was completely at odds with the results of all those previously lauded candidate gene studies. Since then, efforts that specifically tried to replicate the candidate gene studies with high statistical power have produced flat-as-a-pancake null results for IQ test scores, depression and schizophrenia.
Reading through the candidate gene literature is, in hindsight, a surreal experience: they were building a massive edifice of detailed studies on foundations that we now know to be completely false. As Scott Alexander of the blog Slate Star Codex put it: ‘This isn’t just an explorer coming back from the Orient and claiming there are unicorns there. It’s the explorer describing the life cycle of unicorns, what unicorns eat, all the different subspecies of unicorn, which cuts of unicorn meat are tastiest, and a blow-by-blow account of a wrestling match between unicorns and Big-foot.
The whole sorry tale is a textbook example of the perils of low statistical power. The initial candidate gene studies, being small-scale, could only see large effects – therefore, large effects were what they reported. In hindsight, these large effects were extreme outliers, freak accidents of sampling error. The follow-up studies expected to see large effects too, so the sample sizes stayed relatively small. In this way, the studies capitalised on chance results, and built a chain of misleading findings that became the mainstream, gold-standard science in the field. To be sure, there were some null findings, and some meta-analysts sounded alarm bells about low power.
But most candidate-gene researchers ploughed on regardless. Had these geneticists known their history, they would’ve been highly suspicious of the large-effect genes: Ronald Fisher, the statistician who popularised the p-value and the idea of ‘statistical significance’, worked out that complex traits must be massively polygenic – that is, must be related to many thousands of small-effect genes – as far back as 1918.
No comments:
Post a Comment