Friday, June 17, 2016

Granularity-related inconsistency of means

A surprisingly simple test to check research papers for errors from The Economist.

I routinely rail against cognitive pollution arising from poorly structured experiments. This problem is especially prevalent in social sciences, psychology, culture studies, gender studies and similar fields. Virtually all findings in these fields are automatically suspect owing to small sample size, poor controls, self-selection and self-reporting, non-randomization of participants, etc.

Usually you can catch these cognitive pollution generators simply by looking at the methodology description. A team out of Europe has now come up with a simple objective measure to test the likelihood of study error.
The GRIM test, short for granularity-related inconsistency of means, is a simple way of checking whether the results of small studies of the sort beloved of psychologists (those with fewer than 100 participants) could be correct, even in principle. It has just been posted in PeerJ Preprints by Nicholas Brown of the University Medical Centre Groningen, in the Netherlands, and James Heathers of Poznan University of Medical Sciences, in Poland.

To understand the GRIM test, consider an experiment in which participants were asked to assess something (someone else’s friendliness, say) on an integer scale of one to seven. The resulting paper says there were 49 participants and the mean of their assessments was 5.93. It might appear that multiplying these numbers should give an integer product—ie, a whole number—since the mean is the result of dividing one integer by another. If the product is not an integer (as in this case, where the answer is 290.57), something looks wrong.

There is a wrinkle, though. Usually, the published value of the mean is rounded to two decimal places, for convenience. That rounding clearly affects whether the product of it and the sample size will be an integer. The GRIM test gets around this by rounding the product itself to the nearest integer (ie, 291), which is what the result would have to have been if the original numbers were accurate and the mean had not been rounded. That rounded product is then redivided by the sample size and the result of the calculation rounded to two decimal places. If this figure is not exactly the same as the original mean (and it is not, for it is 5.94) then either the original mean or the sample size is incorrect.
Very clever. How does it work in real life?
When Mr Brown and Dr Heathers test-drove their method on 71 suitable papers published in three leading psychology journals over the past five years, what they found justified the pessimistic sounding label they gave it. Just over half the papers they looked at failed the test. Of those, 16 contained more than one error. The two researchers got in touch with the authors of these, and also of five others where the lone errors looked particularly egregious, and asked them for their data—the availability of which was a precondition of publication in two of the journals. Only nine groups complied, but in these nine cases examination of the data showed that there were, indeed, errors.

The mistakes picked up looked accidental. Most were typos or the inclusion of the wrong spreadsheet cells in a calculation. Nevertheless, in three cases they were serious enough to change the main conclusion of the paper concerned.

That, plus the failure of 12 groups to make their data available at all, is alarming.
Pretty much as expected given the reputational diminution of the social sciences over the past decade.

1 comment: