Sunday, March 10, 2013

Reading, publishing and ethnicity

This essay is a response to an argument made on a children's literature forum. The argument originally advanced was that 1) there is a disproportionate scarcity of children’s books by and about children from the major minority ethnic groups in the US, 2) that this scarcity has a measurable negative impact on their longitudinal life outcomes, and 3) that the scarcity of such representation in children’s books can be attributed to conscious or unconscious bias and prejudice within the publishing industry.

The discussion of this argument added a further wrinkle: Do we know by which attributes children elect to self-identify and are those the same attributes which serve as the focus among adults? More critically, at what ages do children become aware of gender, race, class, etc. and are those ever the attributes by which they electively identify themselves (and the characters about whom they read)?

The answers provided so far are:
1) No, there is no study which has sought to quantify the nature and degree of possible over and underrepresentation of different groups in children’s literature. We do not know which, if any, groups are over or underrepresented, nor do we know by what degree.

2) No, there is no empirical evidence that over or underrepresentation in terms of traditional attributes of race, class, culture, gender or orientation have any measurable impact on life outcomes.

3) No, there is no empirical research that identifies the literary characters with whom children identify nor is there any research that identifies which of those character’s key attributes are the cause of children identifying with them.

4) No, there is no evidence that the publishing industry (agents, editors, etc.) are actively discriminating against any class of author or type of book for any reasons other than assessed commercial viability.
Despite the absence of any empirical evidence supporting the initial argument, there are logical and anecdotal reasons to accept or reject each of the four elements of the argument.
1. Over and underrepresentation – Likely true. There is representational variation in all other fields of endeavor with certain fields dominated by one gender or the other, members of one ethnic group or another, class, etc. Applying Ockham’s Razor, it is logical and reasonable to assume that there is also disparate representation in the field of children’s literature. By which groups and to what degree remains unknown.

2. Life Outcomes – Likely false. The assumption that representation in children’s literature is a necessary element to good life outcomes is refuted by the positive sociological metrics of various recent emigrant groups (Nigerians, Koreans, Haitians, Dominicans, sub-continental Indians, etc.) who, by their recentness of arrival, are completely or substantially underrepresented and yet still achieve positive to greater than average outcomes. The research of Anda, Felitti, et al also provides a sense of proportionality. Their research indicates that major childhood traumas (parental divorce, sexual abuse, battered parent, etc.) does have a reliably predicative element of future negative life outcomes (job problems, financial problems, absenteeism, etc.). However, the overall effect is often much less than one might anticipate. For example, children who have suffered sexual abuse as a child do have a 40% greater probability of having job problems as an adult. However, in absolute terms the overall incident rate of job problems for all adults is only 11.4%. Of adults who did not suffer childhood sexual abuse, 10.6% end up having job problems anyway, versus 14.4% of those who did suffer sexual abuse. This is not to discount the gravity of the original tragedy but to put it into numerical perspective. If such grave adverse childhood experiences as abuse, battery, divorce, etc. have a real and measurable impact but at a rate much lower than expected, is it likely that the issue of representation in books does have any predictive negative impact? Possible but not particularly likely.

3. Traditional attributes as identities by which children define themselves and identify with protagonists – Likely false but debatable. Attributes of self-identity is highly variable between cultures, among individuals, and over time. In addition, there is evidence that children’s capacity to distinguish by key demographical attributes is not inherent but an emergent skill. Some capabilities to distinguish attributes appear to emerge relatively late in childhood development. Additionally, while there is little empirical evidence that children reliably identify with protagonists by such traditional attributes (race, class, culture, gender, etc.), there is a fair amount of evidence that the identification/affiliation children do form is situational in nature rather than demographical. In other words, a shared situational issue such as first day at school, bullying, isolation, moral quandary, etc. trumps demographic attributes such as race or gender in terms of a child’s affiliation with the protagonist or character.

4. Publishing industry bias – Likely false. While it is accepted and is likely true that middle class, white, females are disproportionately overrepresented in the book publishing industry, it is also true that occasional surveys of media also indicate an overwhelming self-identification as liberal or registered Democrat, groupings usually associated with an explicit rejection of discriminatory practices and affirmative sympathy for underprivileged and underrepresented groups. This cannot be a refutation of the argument for unconscious bias but it makes it less likely to be accurate.
This essay is an attempt to produce a provisional answer for only the first and fourth items – Are different racial/ethnic groups over and underrepresented in children’s literature and can that be attributed to flaws in the nature of the publishing industry.

The answer, as always in sociological issues, is nuanced.

The analysis requires that we know several pieces of information. 1) What is the ethnic distribution in the US? This was obtained from the 2011 US Census.

Table 1: Percentage of the Population (US 2011 Census)

Caucasian 63.7%
African-American 12.6%
Asian 4.8%
Hispanic 16.4%
Native American 0.9%

Next, we need to know what percentage of each of those populations electively read each year. This was obtained from the National Endowment of the Arts report, Reading At Risk.

Table 2: Percentage of the Population Who Read Electively

Those that Electively Read

Reading Population (in millions)

Caucasian

51.4%

101

African-American

37.1%

14

Asian

55.0%

8

Hispanic

26.5%

13

Native American

30.0%

1

Next we need to know how much each group reads per year. This information was obtained from the Bureau of Labor Statistics. Regrettably they only have information for White, Black, and Hispanic. A plug figure for Asians has been used, setting it at an equivalent amount to Whites based on similarity of the patterns between the two groups in other BLS data. A plug figure has been used for Native Americans that is midway between the numbers for African-Americans and that of Hispanics, again based on such patterns elsewhere in other BLS data.

Table 3: Hours Spent Reading Each Year
Hours Spent Reading per year
Caucasian 192
African-American 84
Asian 192
Hispanic 74
Native American 79

Hours spent reading covers books, newspapers and magazines so we need to determine how much time is spent reading books versus newspapers and magazines. This information is available from a large scale study by Kaiser Family Foundation Study. Again, we are missing detail for Asian and Native Americans. Again plug figures have been used. Asians have been assigned the same figures as Whites and Native Americans have been assigned the higher of the two figures between African-Americans and Hispanics.

Table 4: Percent of Time Spent Reading Books
Percent of Time Spent Reading Books
Caucasian 93%
African-American 55%
Asian 93%
Hispanic 88%
Native American 88%

Finally, we need to estimate the book buying patterns among the different groups. All book reading could theoretically occur in the library. At the other extreme we could assume that everyone purchases every book that they read. Of course, the reality is in between with most people both reading from the library and purchasing books. Libraries, based on industry estimates are between 20-30 percent of the book purchasing market. Without specific usage patterns, we have to make assumptions. For analysis purposes, I have assumed that everyone with an income above the national household median (approximately $50,000 in 2011) purchases books for their reading and everyone with an income below the median uses the library. Not a strictly accurate assumption since most people use a mix but the assumption is likely a close proxy to reality. So we need to know what percentage of each population group is above or below the national household income median. This is available from the US 2011 Census.

Table 5: Percent of each Group Above the National Household Median Income
Percent Who Can Purchase Books
Caucasian 52%
African-American 34%
Asian 61%
Hispanic 38%
Native American 34%

From the above data, it is possible to calculate the percentage representation of each racial group in terms of the market of who buys books. In other words, knowing how much each group reads, the split that should be accorded to books, how able they are to afford books, etc. we are able to calculate the market in terms of book buyers.

Table 6: Book Market by Race of Reader
Market Share
Caucasian 84%
African-American 3%
Asian 7%
Hispanic 6%
Native American 0.3%

With one further piece of information, we are in a position to assess the degree of over and underrepresentation in children’s books and the degree to which this reflects market reality or prejudice by publishers.

In 2011, the CCBC received approximately 3,400 children’s books from publishers, just under 10% of the 36,027 children’s books estimated by Bowker as being published in 2011. In terms of analysis, much depends on whether the 3,400 books received by the CCBC are representative of the overall total. Since the books are self-selected by publishers for submission, 3,400 is certainly not a random sample; but is it a representative sample? It is possible that they are unrepresentative but it is not possible to know. My guess is that major publishing houses are overrepresented and small and medium publishing houses are underrepresented. Since the publishing houses which specialize in multicultural and ethnic oriented publishing are small, it is possible that the CCBC sample underrepresents multicultural/ethnic publishing. On the other hand, 10% is a very large sampling for statistical purposes, and that tends to reduce the consequences of random versus self-selected sampling. In addition, the 36,027 figure likely includes a high percentage that are repackagings of past year's books. In other words, a reissue of Anne of Green Gables with a new cover will show up as a new title in the Bowker numbers but would not be representative of new books brought to market which is what the CCBC numbers represents. My assumption is that the CCBC population of books is representative of the larger market of new children’s books published in 2011.

There is one further assumption that needs to be addressed. The CCBC conducts an analysis by which it identifies each book as to whether or not it has significant content for each minority group. It does not do a comparable assessment for Whites which we need in order to complete our analysis. The first issue is to make an adjustment for the fact that not all children’s books are going to have characters who lend themselves to racial assessment. Specifically most concept books, much fantasy and a reasonable chunk of science fiction lack protagonists who are racially identifiable. I have made an adjustment of 10%, which I suspect is conservative, of the total book population based on this issue. Secondly, the fact that a book has significant content devoted to one or more minority groups does not necessarily preclude that it also has significant White content as well. I have made the assumption, purely speculatively, that two thirds of books with significant minority content are substantially only minority content.

The CCBC reports that 3.6% of the books they receive have significant African-American content, 2.7% have significant Asian content, 1.7% significant Hispanic content and 0.8% significant Native American content. With the above stated assumptions, we can calculate that 84% of books have significant White content.

The outcome of CCBC’s calculations by race is that 9% of published books had significant minority content versus an overall minority population of 35% and a minority population of book buyers of 16%. Consequently it is possible to confirm that minorities are in the aggregate materially underrepresented. However, the aggregate numbers mask interesting and useful variations by race.

So the final results in terms of representativeness in children’s literature are as follows. Not all columns add to 100 based on rounding and based on adjusted numbers (e.g. non-racially identifiable content).

Table 7: Racial Representation in Book Publishing

Percentage of the Population

Book Buying Market Share

Representation in Published Books (CCBC)

Caucasian

63.7%

84%

84%

African-American

12.6%

3%

3.6%

Asian

4.8%

7%

2.7%

Hispanic

16.4%

6%

1.7%

Native American

0.9%

0.3%

0.8%

It is easier to see the underlying pattern by looking at the degree of over and underrepresentation

Table 8: Degree of Over and Underrepresentation by Race in Children’s Books

Under/ Overrepresentation by Demographic


Under/ Overrepresentation by Book Buyers


Caucasian


32%


0%


African-American


-71%


32%


Asian


-44%


-64%


Hispanic


-90%


-70%


Native American


-8%


178%


As can be seen, when looking only at the overall demographics, Whites are 32% overrepresented, African-Americans are underrepresented by 71%, Asians by 44%, Hispanics by 90% and Native Americans by 8%. So while there are a material number of books published each year with a significant content for each of the populations, the odds of a minority child seeing a minority protagonist are materially less than a white child seeing a white protagonist.

The argument that publishers are consciously or unconsciously biased against minority literature is much harder to sustain from this analysis. Publishers are commercial ventures that earn money by meeting the demands of the market. Based on this, the proper comparison is not to the demographic split but to the book buying split. When this is done, the results indicate that publishers have just about exactly matched content to the White book buying segment of the market. They are overserving black content books by 32%, and Native American by a whopping 178%. On the other hand, it seems clear that they are materially underserving the Asian and Hispanic segments of the market by 64% and 70% respectively.

The CCBC noted that while in general the number of books including people of color content had plateaued over recent years, the number addressing Asians has been rising. So perhaps the underrepresentation of the Asian segment of the market is just a function of recentness of emigration and is simply a timing issue. In addition, the category of Asian for Census purposes masks material underlying cultural and historical differences. Chinese are not Japanese are not sub-continental Indians are not Koreans are not Pacific Islanders. These differences combined with differences in the timing of emigration in recent decades may have slowed the publishing industry uptake. Slowed perhaps, but certainly not prevented as indicated by the rising trend.

The real mystery in these numbers is the category of Hispanic. The same issue of sub-category disparity (Mexicans, Cubans, Puerto Ricans, Dominicans, etc. all being perhaps distinct markets) might be at play here but I am skeptical that that would have all that great an impact. The Mexican component is such a large percentage of overall Hispanic, that it ought to be a robust market on its own. It is possible that the issue of legality of residence might be influencing these numbers. Specifically, since 25-35% of the Hispanic population are estimated to be here illegally, AND since such status issues (along with the associated need for frequent mobility) might disproportionately suppress acquisition of optional material goods such as books, it is conceivable that the actual book buying market among Hispanics is materially less than has been calculated here. However, that is pure speculation.

So what can we conclude from this exercise?
• There are significant differences in the culture of reading among the different ethnic groups affecting the propensity to read, the preference of books over newspapers and magazines and the volume of elective reading done in a year.

• Across the board, ethnic minorities are underrepresented in books published each year when compared to their proportion of the overall population.

• When compared to the proportion of races among the book buying population, publishers seem to be accurately estimating the market for Whites, overestimating the market for Blacks and Native Americans (completely ignoring the possible mitigating factor of cross-over appeal), and underestimating the market for Asians and Hispanics.

• Underrepresentation of Asians may be simply a function of timing and/or sub-category diffusion.

• Underrepresentation of Hispanics does not have an obvious root cause though there are several conceivable ones.

• Publishers do not appear to be inhibited in publishing minority content books commiserate with perceived demand.

• Publishers proclivity to publish minority content appears to be driven by market forces rather than bias or prejudice.

• This analysis suggests that the root causes for publishing disparities arise from cultural variations that manifest in differences in choices and decision-making. Specifically, there are four root causes that lead to underrepresentation in the market of books. These four root causes are 1) low elective reading rates, 2) low volumes of time spent reading, 3) low preference for book reading compared to newspaper and magazine reading, and 4) low propensity for buying books.

• Addressing these four root causes should yield greater representation in the number of books with minority significant content.
This analysis would appear to support the original recommendations:
1) Increase demand for books in general and the habit of reading in particular within each constituency (as well as at large)

2) Increase quality of books (broadly defined but especially in terms of editorial review)

3) Increase the cultural and societal value attached to enthusiastic reading

4) Improve or create better market making mechanisms for matching supply with demand

5) Improve forecasting competency to improve the yield of profitable books to the total number of published books
It is important to ensure clarity: This analysis is by no means fool-proof. It does not have the robustness of a singular large scale empirical study. The sources of information for this analysis are individually reasonably reliable and robust in their own context but using multiple data sources always introduces an ungovernable source of potential error. While the numerical results of this analysis might vary in degree compared to a singular large scale empirical study, it is reasonable to assume that the directionality of the analysis is likely correct.

With all that said, this analysis does move us in the right direction; away from simple raw speculation which can neither be affirmed or refuted to a glimpse at a testable numerical reality, flawed though it might be.

ADDENDUM:

An interesting article What is the Business of Literature? by Richard Nash which explores the dynamics of the publishing industry and speculates about the future. In its scope and focus on fundamentals, it is something of a rebuke of those who look at publishing through monocular lenses of gender or race or class, etc. There are interesting insights through those narrow lenses but they risk missing the forest for the trees.

Also: Alexis Madrigal in the Alantic has a lengthy post (A Day in the Life of a Digital Editor, 2013 by Alexis C. Madrigal) that lays out the production and financial challenges of journalistic content production from a magazine and internet perspective. He includes hard numbers which makes the discussion much more meaningful. The gensis is a post by a freelance journalist upset about the compensation models used by The Atlantic. Madrigal's response might be summarized as: All due respect, but that's the way it is right now in the industry, here are the reasons why, none of us like it, but there it is.

There is a lot of great discussion in the piece, too much to reflect in a brief excerpt. These three paragraphs will have to suffice.

I can already see some old-school journalists tearing up. This poor kid, he looks at the numbers and ergo, that's all he cares about. "Traffic," they spit. And I get it. The word has been used to bludgeon you into dumb shit. To put great stories on the shelf to build slideshows. To give up on quality and focus on quantity. I do get all that. But that's precisely why we (journalists) must understand the numbers! The business side of any publication knows them inside and out. If we don't understand how to tell good stories with our own data, who do you think wins any argument that involves data, which they all do? You can know money is important without succumbing to the idea that cash rules everything around you.)

Let me try to convince you of this: We can have binocular vision. We can understand these numbers. And we can know that the mission of a place like The Atlantic is to bring moral purpose, interesting ideas, great arguments, and excellent reporting to the world and to drive these stories as far as they will go into the public consciousness.

Furthermore, looking at the numbers teaches you about the social reality of the Internet. In a very real sense, unless you look at the numbers, you do not know what (the dynamic sociotechnical space that is) the Internet looks like. Your view lets you see its boulevards and parks, but it is like a photograph from the 1850s when the exposure times were too long to capture moving people. Your Paris is empty.
I like that one particular line: "looking at the numbers teaches you about the social reality". Numbers are merely a reflection, sometimes accurate and sometimes distorted, of reality. They can lead the gullible astray as easily as guide the attentive wisely. But if you don't know the numbers, you are shouting in a vaccuum.

Also: See We Aren’t the World by Ethan Watters for the challenges regarding cultural variation.


Data Sources

Life Outcomes Discussion - Consequences of Childhood Adversity - The Protective Effect of Family Strengths in Childhood against Adolescent Pregnancy and Its Long-Term Psychosocial Consequences by Susan D Hillis et al (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2937841/)

Consequences of Childhood Adversity - Childhood Abuse, Household Dysfunction, and Indicators of Impaired Adult Worker Performance by Robert F Anda et al (http://xnet.kp.org/permanentejournal/winter04/childhood.pdf)

Table 1: Percentage of the Population - US 2011 Census (http://www.census.gov/compendia/statab/cats/population.html)

Table 2: Percentage of the Population Who Read Electively – National Endowment for the Arts report, Reading At Risk: A Survey of Literary Reading in America (http://www.nea.gov/pub/readingatrisk.pdf)

Table 3: Hours Spent Reading Each Year – Bureau of Labor Statistics: American Time Use Survey (http://www.bls.gov/tus/)

Table 4: Percent of Time Spent Reading Books – Generation M2: Media in the Lives of 8 – 18-Year-Olds, Kaiser Family Foundation Study (http://www.kff.org/entmedia/upload/8010.pdf)

Table 5: Percent of each Group Above the National Household Median Income - Income Distribution by Race - US 2011 Census, Historical Income Tables: Households: Table H-17.Households by Total Money Income, Race, and Hispanic Origin of Householder [XLS - 4.5M] (http://www.census.gov/hhes/www/income/data/historical/household/)

Number of new titles – Bowker (http://www.bowker.com/assets/downloads/products/isbn_output_2002-2011.pdf)

Table 7: Racial Representation in Book Publishing – Cooperative Children’s Book Center Analysis - Observations on Publishing in 2011 by Kathleen T. Horning, Merri V. Lindgren, and Megan Schliesman (http://www.education.wisc.edu/ccbc/books/choiceintro12.asp)

No comments:

Post a Comment