Wednesday, March 19, 2014

People use the lack of data as an excuse to avoid having to examine their premises

From What the Fox Knows by Nate Silver. Silver is always interesting and insightful.
Conventional news organizations on the whole are lacking in data journalism skills, in my view. Some of this is a matter of self-selection. Students who enter college with the intent to major in journalism or communications have above-average test scores in reading and writing, but below-average scores in mathematics. Furthermore, young people with strong math skills will normally have more alternatives to journalism when they embark upon their careers and may enter other fields.4
[snip]
But often, general managers and CEOs and op-ed columnists use the lack of data as an excuse to avoid having to examine their premises.
[snip]
So perhaps we should think more carefully about the process by which anecdote is transformed into data and information. We might break it down into four rough steps:

The first step is the collection of data or evidence. For a traditional journalist, this is likely to involve some combination of interviewing, documentary research and first-person observation. But data journalists also have ways of collecting information, such as by commissioning polls, performing experiments or scraping data from websites.

The next step is organization. Traditional journalists have a well-established means of organizing information: They formulate a news story. The story might proceed chronologically, in order of importance (the inverted pyramid) or in some other fashion. Data journalists, meanwhile, can organize information by running descriptive statistics on it, by placing it into a relational database or by building a data visualization from it. Whether or not a picture is worth a thousand words, there is value in these approaches both as additional modes of storytelling and as foundations for further analysis.

The third step is explanation. In journalistic terms, this might mean going beyond the who, what, where and when questions to those of why and how. In traditional journalism, stories of this nature are sometimes referred to as “news analysis” or “explanatory journalism.” Data journalists, again, have their own set of techniques — principally running various types of statistical tests to look for relationships in the data.

Let’s pause here for a moment. Up through the first two steps, traditional journalists looked very good. The original reporting they do is tremendously valuable. Besides, most of us learn by metaphors and stories. So traditional journalism’s method of organizing information into stories has a lot of appeal when news happens.

By the third stage, however, traditional journalism has begun to produce uneven results — at least in my view. Take the best-selling book “Double Down” by Mark Halperin and John Heilemann. It contains a lot of original and extremely valuable reporting on the 2012 campaign. Its prose style doesn’t match mine, but it’s a crisp and compelling read. But Halperin and Heilemann largely fail at explaining how Barack Obama won re-election, or why the campaign unfolded as it did.

For example, they cite three factors they say were responsible for Mitt Romney’s decline in the polls in early mid-September: the comparatively inferior Republican convention, Romney’s response to the attacks in Benghazi, Libya, and Romney’s gaffe-filled trip to London. In fact, only one of these events had any real effect on the polls: the conventions, which often swing polls in one direction or another. (This does not require any advanced analysis — it’s obvious by looking at the polls immediately before and after each event.)

Explanation is more difficult than description, especially if one demands some understanding of causality. It’s something every field struggles with; there are lots and lots of wrongheaded statistical analyses, for instance.

Still, there are some handicaps that conventional journalism faces when it seeks to move beyond reporting on the news to explaining it. One problem is the notion of “objectivity” as it’s applied in traditional newsrooms, where it’s often taken to be synonymous with neutrality or nonpartisanship. I prefer the scientific definition of objectivity, where it means something closer to the truth beyond our (inherently subjective) perceptions. Leave that aside for now, however. The journalistic notion of objectivity, however flawed, at least creates some standard by which facts are introduced and presented to readers.

But while individual facts are rigorously scrutinized and checked for accuracy in traditional newsrooms, attempts to infer causality sometimes are not, even when they are eminently falsifiable. (The increased speed of the news-gathering process no doubt makes this problem worse.) Instead, while the first two steps of the process (collecting and organizing information in the form of news stories) are thought to fall within the province of “objective” journalism, explanatory journalism is sometimes placed in the category of “opinion journalism.” My disdain for opinion journalism (such as in the form of op-ed columns) is well established, but my chief problem with it is that it doesn’t seem to abide by the standards of either journalistic or scientific objectivity. Sometimes it doesn’t seem to abide by any standard at all.

A more data-centric approach is perhaps most helpful, however, when it comes to the fourth step, generalization.

Suppose you did have a credible explanation of why the 2012 election, or the 2014 Super Bowl, or the War of 1812, unfolded as it did. How much does this tell you about how elections or football games or wars play out in general, under circumstances that are similar in some ways but different in other ways?

These are hard questions. No matter how well you understand a discrete event, it can be difficult to tell how much of it was unique to the circumstances, and how many of its lessons are generalizable into principles. But data journalism at least has some coherent methods of generalization. They are borrowed from the scientific method. Generalization is a fundamental concern of science, and it’s achieved by verifying hypotheses through predictions or repeated experiments.
I like the proposed model: Collection, Organization, Explanation, Generalization. It is powerful in its simplicity.

No comments:

Post a Comment