Sunday, January 3, 2021

There's never just one way to analyse a dataset.

From Science Fictions by Stuart Ritchie.  Page 105.  

There's never just one way to analyse a dataset. Do you delete those outlying datapoints because you reason that they make your sample less representative of the population? Or do you leave them in? Do you split the sample up into separate age groups, or by some other criterion? Do you merge observations from week one and week two and compare them to weeks three and four, or look at each week separately, or make some other grouping? Do you choose this particular statistical model, or that one? Precisely how many ‘control’ variables do you throw in? There aren’t objective answers to these kinds of questions. They depend on the specifics and context of what you’re researching, and on your perspective on statistics (which is, after all, a constantly evolving subject in itself): ask ten statisticians, and you might receive as many different answers. Meta-science experiments in which multiple research groups are tasked with analysing the same dataset or designing their own study from scratch to test the same hypothesis, have found a high degree of variation in method and results.
 
Endless choices offer endless opportunities for scientists who begin their analysis without a clear idea of what they’re looking for. But as “offer endless opportunities for scientists who begin their analysis without a clear idea of what they’re looking for. But as should now be clear, more analyses mean more chances for false-positive results. As the data scientists Tal Yarkoni and Jake Westfall explain, ‘The more flexible a[n] … investigator is willing to be – that is, the wider the range of patterns they are willing to ‘see’ in the data – the greater the risk of hallucinating a pattern that is not there at all.’

No comments:

Post a Comment