Monday, July 29, 2019

No, we aren't biased. We are just picking from a biased pool.

A great example of designed systemic biases which are not inherently obvious. From Analyzing Google News: Introduction by Greg Coppola.

The Big Social Media/Tech companies such as Amazon, Facebook, and Google have since circa 2010-12 begun investing more in lobbying and taking increasingly strong stances on controlling what speech they will permit in their platforms. Were this limited to criminal matters, there would still be a debate but less concern. Regrettably though, the Tech companies are not viewpoint neutral, nor are they diverse in their views, and nor are they representative of the viewpoints of Americans. Hence an increasing concern about their sway in the economy, in society, and in politics.

Google executives in press releases and Congressional testimony under oath, have stated unequivocally that Google does not shade their results in order to advance a viewpoint or a political agenda. Well . . . maybe. Or maybe not. Here is a list of some past blog posts exploring the issue. It sure looks like there is some fire to go along with the smoke.

Coppola is a Google employee, recently suspended for discussing Google's active intervention in politics to advance their own commercial agenda as well as their own world view, to the detriment of the average citizen.

In this example, he illustrates how to bias results without being seen to bias results.
We begin by replicating and extending an experiment run originally by Paula Boylard. I scraped Google News, searching for the query “donald trump”, once a minute, 5000 times. A scrape had 105 stories on average.

Power-Law Distribution Over Sites

We begin by looking at the distribution of publications (or web-sites) that make up our new Google/Trump corpus. In particular, we look at the probability that a randomly selected story comes from each given news site. The results are depicted here:

Click to enlarge.

Note the use of a power-law (or 80/20, or rich-get-richer) distribution. The most-used site, CNN, is selected in 20% of all articles! In other words, even with the millions of sites on the Internet, 1 out of every 5 stories about “donald trump” from Google News is from CNN.
Woof. Look at the mainstream media sites. Not only are they famously anti-Trump but they are also materially unrepresentative of the average American new reader or viewer.

Why would Google need to do anything do create negative content about Trump if all they are doing is pulling from sites that are a priori against Trump.

This issue is even more clear if you look at the cumulative distribution.

Click to enlarge.

As Coppola says:
In power-law style, 50% of all stories come from the top 5 sites (CNN, USA Today, NYT, Politico, Guardian), and 83% of all stories come from the top 20.
All these media vehicles have richly documented anti-Trump, anti-Republican, biases - documented in terms of sentiment analysis of their coverage (93% negative) and documented in terms of who among their employees contributes how much to which party.

By drawing their news reporting solely from sources with a strong documentation of bias, Google is able to ensure that their own collation of news results will be biased, without ever having to write a single line of biased code which might betray their actions.

No comments:

Post a Comment