Wednesday, October 16, 2013

It simply fails to remove the plausibility of that hypothesis

A great example of the distinction between telling a story (the reporter's interpretation of events) and telling the story (the reporter's report of the facts). Pam Belluck of the New York Times reports on the results of a social psychology study in Better Social Skills, Scientists Recommend a Little Chekhov. Regrettably it is almost 100% gullible interpretation and 0% critical reporting.

The story is that the scientists conducted a study that purports to support the popular notion that reading literary fiction (as opposed to other forms of fiction such as romance, mysteries, etc.) improves one's social skills.

But that is not what was done at all. The reporter has done hardly any reporting and instead focused on telling a story rather than the story. The story that the reporter wants to tell, and that many readers enthusiastically endorse, is that enthusiastic readers of literary fiction have a greater reservoir of empathy developed through that reading of literary fiction and therefore are more socially adept than non-readers of literary fiction.

The original study can be viewed here, Reading Literary Fiction Improves Theory of Mind by David Comer Kidd and Emanuele Castano.

The claims in reports from the NYT and elsewhere regarding what the study yields include:
"The results suggest that reading fiction is a valuable socializing influence."

"But psychologists and other experts said the new study was powerful because it suggested a direct effect — quantifiable by measuring how many right and wrong answers people got on the tests — from reading literature for only a few minutes."

"Experts said the results implied that people could be primed for social skills like empathy, just as watching a clip from a sad movie can make one feel more emotional."

"“This really nails down the causal direction,” said Keith Oatley, an emeritus professor of cognitive psychology at the University of Toronto."
Reading the commenters to the NYT article, there is a clear preponderance, probably 90%, who are interpreting the study to support that which they already knew. But there are some lone voices calling for actual attention to the scientific method.

When reviewing new science findings, you usually look for several key elements in order to determine how much weight to attach to those findings. These usually include such things as
How rigorous was the design of the study?
Are there indications that the study was intended to produce predetermined outcomes?
Who is paying for the study?
Professional standing of the authors
How many people participated?
What was the time duration of the study?
How diverse (age, class, profession, culture, race, religion, etc.) were the participants?
How consistent are the findings with other comparable studies?
How rigorously are counter factuals addressed?
How rigorously are terms defined?
Are alternative views or opinions discussed?
How careful are the researchers to ensure that apples are compared to apples?
How specific are the measures of performance?
To what degree were the studies lab bound versus real world observational?
To what degree does the report address direct results as opposed to proxy results? (Ex. demonstrated empathy versus measured empathy)
To what extent are the participants neutral or have a stake in the study outcome?
Is there a baseline of before and after?
Do they report the absolute and relative degrees of performance improvement?
Are these present in this study or in the NYT summary of the study? Are they present to the degree that would support the enthusiastic reception and overintepretation of the results? Hard to say since the study is gated. But from what is reported in the NYT and Scientific American articles we can answer some of the questions.
How rigorous was the design of the study? - Not at all, see below and see Annals of overgeneralization by Mark Liberman for a good run down on the specific weaknesses.

Are there indications that the study was intended to produce predetermined outcomes? - From the researchers comments, it appears that they found what they were expecting to find.

Who is paying for the study? - Unknown

Professional standing of the authors - Affiliated with The New School for Social Research which is reasonably credible. However, the field of sociology and psychology are plagued with extraordinarily high rates of withdrawn or unreplicated research.

How many people participated? - Unknown. However, the sample of texts being compared was markedly limited, usually three texts in each population of the five experiments. Meaninglessly small populations.

What was the time duration of the study? - Reading segments were 3-5 minutes and then they were immediately tested for empathy. That leaves open the question whether there is any lasting impact, i.e. read for five minutes and then test three hours later or three days later.

How diverse (age, class, profession, culture, race, religion, education attainment, etc.) were the participants? - Unknown but a real likelihood that the participants were unrepresentative across the measures of diversity. They seem to have ensured at least some diversity of age (18-75) but since they were recruited from Amazon, that might imply certain preconditions around pre-existing reading habits, class, income, technology access, education, etc.

How consistent are the findings with other comparable studies? - Evidence not discussed in the NYT article. Another research report published the same month has the exact opposite to this study's finding. For example, What You Read Matters: The Role of Fiction Genre in Predicting Interpersonal Sensitivity by Katrina Fong et al, examining four genres (but not including literary fiction) indicates that "Romance and Suspense/Thriller genres remained significant predictors of interpersonal sensitivity."

How rigorously are counterfactuals addressed? - Not at all.

Are alternative views or opinions discussed? - No. Only those wishing to endorse the findings are reported in the article.

How rigorously are terms defined? - Not at all. Literary fiction is not defined in terms that identifies specific attributes different from popular or genre fiction.

How careful are the researchers to ensure that apples are compared to apples? - Intentionally designed to skew the results towards a favorable outcome for literary fiction. For example, literary non-fiction was intentionally excluded. By failing to make apples-to-apples comparisons, the researchers leave open the glaring possibility that it is the literariness of the writing that might make a difference rather than the fictionality. This obvious act of exclusion reinforces the perception that the researchers were seeking a particular conclusion.

How specific are the measures of performance? - The measures are not reported at all. From elsewhere it appears that a 30 point scale was used for measuring empathy but that is not referenced in the NYT article.

Do they report the absolute and relative degrees of performance improvement? - Not in the NYT article. From elsewhere it appears that the degree of difference in the various experiments was 1-2 points on the 30 point scale. It appears that while the results might have been statistically significant, they were not, in fact, material.

To what degree were the studies lab-bound versus real world observational? - Completely lab-bound.

To what degree does the report address direct results as opposed to proxy results? (Ex. demonstrated empathy versus measured empathy) - Completely reliant on proxy results. The subjects' level of empathy was determined through tests rather than demonstrated behaviors.

To what extent are the participants neutral or have a stake in the study outcome? - Unclear though it appears that there may have been a degree of self-selection among participants. In addition, they were given a nominal compensation for participation.

Is there a baseline of before and after? - No. This is especially critical when dealing with small populations.
So The Story is that a couple of researches in a field notorious for flawed research, conduct a badly designed experiment involving too few samples, with inconsistent comparisons, with selected participants (not random), with poorly defined categories, in an apparent attempt to yield an expected outcome. The reporter revealed hardly any of that, instead telling A Story, the story she would like to believe, and apparently which most of her readers also would like to believe - reading literary fiction makes you a better person.

As it happens, I would like to believe, and I suspect that it is true, that enthusiastic reading is contributive to positive life outcomes. I can demonstrate a correlation for that proposition but am weak on establishing the causative direction and the causative process. I am somewhat skeptical, but open to the proposition, that enthusiastic reading over time might have some contributive effect towards greater degrees of empathy. However, this study does not provide any support to that proposition. Instead, it is just more cognitive pollution; popularly received and endorsed pollution, but pollution none-the-less.

So all the enthusiastic conclusions mentioned above; valuable socializing influence, a direct effect, causal direction, etc.? Tosh! All tosh. People are stating what they want to believe. This study provides no evidence for any of those beliefs.

Commenter zstansfi at Scientific American has a great observation:
This common fallacy needs to be rooted out. An empirical study which is "consistent with" (that is, does not oppose), but yet which remains entirely unsupportive of a hypothesis does not, in fact, strengthen that hypothesis. It simply fails to remove the plausibility of that hypothesis.

No comments:

Post a Comment