From Science Fictions by Stuart Ritchie. Page 127.
Another particularly elegant method for testing whether the numbers reported in a paper check out has the decidedly inelegant name of ‘Granularity-Related Inconsistency of Means’ – or ‘the GRIM test’ for short.11 Devised by the data sleuths Nick Brown and James Heathers, the GRIM test can be used to check whether the average (specifically, the arithmetic mean) of a set of numbers makes sense, given how many numbers the set contains. Imagine you’re asking people to rate how happy they are with their job, on a scale of 0 to 10 (and you’ve only given them the option to respond in whole numbers – say, ‘4’ or ‘5’, but not ‘3.7’). In the simplest case, let’s say you gave this survey to just two people, and you report the average of their scores: that is, you add up their ratings and divide the total by two. If you take that result and look at the digits after the decimal point, there are only so many ways they can look: with two people, the average of their answers can only end in .00 or .50. If you said that the average was, say, 4.40, something must have gone wrong: there’s no way to divide a whole number in half that would produce that fraction.
The GRIM test applies this same logic to bigger samples. For example, if twenty participants rate something on a 0-to-10 whole-number scale, there’s no way you can arrive at an average of 3.08. Dividing by twenty means that the decimal values can only be increments of .05; it’s plausible to get an average of 3.00, or 3.10, or 3.15, but 3.08 is an impossibility. Brown and Heathers used the GRIM test to check a selection of seventy-one published psychology papers and found that half of them reported at least one impossible number, while 20 per cent contained several. As with statcheck, GRIM errors can have benign causes, but they are red flags that signal the need for further investigation.
The number 3.08 in my example was a deliberate choice, because it’s a notable one from the history of the GRIM test – and psychology research in general. In 2016, the psychologist Matti Heino applied the GRIM test to one of the most famous psychology papers of all time: Leon Festinger and James Carlsmith’s 1959 paper on ‘cognitive dissonance’. This is the now widely known idea that forcing someone to say or do something inconsistent with their true beliefs will make them psychologically uncomfortable and they’ll do their best to alter those beliefs to make them fit with what they’ve been made to say or do. In the 1959 study, participants were made to complete some tedious, pointless tasks, such as endlessly twisting pegs around on a pegboard. When they were finished, some were paid $1 to tell the next waiting participant that they’d found the tasks really interesting and enjoyable. In an interview afterwards, the participants who’d been paid $1 to lie about the tasks reported thinking the task was much more enjoyable than those who were paid nothing. They’d reduced their dissonance, in other words, by making themselves believe they’d had fun. Alas, Heino’s use of the GRIM test showed that it wasn’t just the participants’ beliefs that were inconsistent – it was Festinger and Carlsmith’s numbers. They reported an average score of 3.08 for a sample of twenty people filling in a scale of 0-to-10, which as we just saw isn’t possible, alongside several other averages that failed the GRIM test.
No comments:
Post a Comment