Tarrok links to other research that highlights some of the structural and procedural reasons why so much published research is incorrect; in some fields, the majority of findings being non-replicable, Why Most Published Research Findings are False by Alex Tabarrok. Helpfully, Tarrok has a summary of heuristics for assessing the possible validity of any published research.
1) In evaluating any study try to take into account the amount of background noise. That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.Then there is this piece from Real Clear Science about why certain myths and fallacies are widely believed (because they are repeated). Repetition: The Key to Spreading Lies by Ross Pomeroy.
2) Bigger samples are better. (But note that even big samples won't help to solve the problems of observational studies which is a whole other problem).
3) Small effects are to be distrusted.
4) Multiple sources and types of evidence are desirable.
5) Evaluate literatures not individual papers.
6) Trust empirical papers which test other people's theories more than empirical papers which test the author's theory.
7) As an editor or referee, don't reject papers that fail to reject the null.
This is followed by an article in the Daily Telegraph, Blood pressure drug 'reduces in-built racism' by Stephen Adams, on some newly reported results that highlight the issues in the three articles above.
1) Background noise (the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise) - The social sciences are notorious for background noise. Everyone in academia is always looking for racism in virtually any context, i.e. there are a lot of hypotheses and little selectivity.
2) Bigger samples are better - A sample size of 36 is almost laughable.
3) Small effects are to be distrusted - Can't tell how large or small the effects are because they aren't reported. They describe the effects as causing the recipents to score significantly lower in the Implicit Association Test but they don't actually say how much. One would expect that if the effect were truly significant that they would actually provide the measure of significance. The absence of such measure calls the characterization into doubt.
4) Multiple sources and types of evidence are desirable - One report from one group.
5) Evaluate literatures not individual papers - One paper only.
6) Trust empirical papers which test other people's theories more than empirical papers which test the author's theory - This appears to be a paper by researchers seeking information on their own theory.
7) As an editor or referee, don't reject papers that fail to reject the null - NA.
Further, there is no discussion at all about the very questionable foundations of what Implicit Association Test is actually measuring and whether it is real and useful. It is not that the test is debunked, only that it is controversial with much debate still extant about both the reality of the phenomenon as well as its pertinence.
So you have an unmeasured effect from a tiny sample using disputed measurement tools applied to a recent and debated field in a study conducted by researchers with a stake in the outcome. More red flags than a Maoist parade. And this was reported in some three hundred articles within a couple of days of the news release (see Repetition: The Key to Spreading Lies by Ross Pomeroy above). Could the cognitive waters get more muddied?
No comments:
Post a Comment