Friday, February 9, 2018

When well intentioned micro purity creates strategic system fragility

A couple of days ago I posted Performance differentials in bias and discrimination free systems. That post was an exploration of the role empirical evidence plays in terms of understanding productivity, compensation, variance, personal choices, bias and discrimination. The post originated from a study by a group of academics who recently realized that Uber drivers are a unique opportunity to examine wage gap variance because the Uber system precludes any opportunity for human discrimination.

What they found that there was a 7% difference in comparable compensation between men and women even though there was no possible human bias or discrimination. The 7% compensation difference in favor of men was due primarily (50%) to men driving 2% faster, men being more experienced and practiced with the Uber system (30%), and men preferring to compete for certain types of routes, airport runs, (20%). In other words, the 7% variance was entirely due to personal choices and behaviors of the individual participants which in turn generated higher levels of productivity.

One of the points I make in that earlier post is that when doing multivariate analysis on a large data set, it is not uncommon for the system to display zero evidence of differential outcome (equal pay for equal work) at the aggregate level and yet for there to still be individual levels of bias and discrimination, as long as those instances of bias are randomly distributed.

It is the implication of that observation that I wish to explore.

Based on recent US history (racial prejudices in the 1960s and earlier) it was indisputable that black American citizens were being denied their full rights. The evidence was in everyday experience and could be seen in various socioeconomic measures such as family formation, income, education attainment, etc. I would argue that obviousness of discrimination was by far the greater weight of the argument.

We passed the civil rights legislation of the 1960s, dusted our hands and sat back and waited for the systemic changes to take effect. While waiting, activists turned their eyes on other forms of apparent discrimination, against Hispanics, against Asian American immigrants, against women, against LGBT, etc.

And here is where I think we began tripping ourselves up. It is in many ways more difficult to see anything other than the most obvious discrimination against most these other groups, especially in terms of women. The issue was exacerbated because most of the most vocal activists on behalf of women's rights were upper middle class educated white women.

It was very hard to see anything visually comparable to the Selma March when looking at how middle class, educated white women were treated.

I hypothesize that what happened was that those advocates, in the absence of obvious visual evidence, ended up going to the numbers to make the case that women suffer immense discrimination in a fashion comparable to African Americans.

And I think that is where things came off the track. We did not numerically prove racial discrimination and then fix that discrimination through revised laws. We saw that racial discrimination, we felt it. The more extreme feminist activists, with no viscerally visual comparable argument, ended up instead trying to rely on the numbers. More specifically, they went searching for disparate impact. What they failed to recognize was that Civil Rights legislation did not emerge because of the numerical analysis of disparate impact of racial discrimination. Support for Civil Rights legislation arose because we saw the funerals of little black girls killed in church bombings, because we saw fellow citizens being beaten for their exercise of free speech, because we had the eloquence of Martin Luther King to give voice to the tragedy being acted out on our screens.

And the challenge for 1970s feminists was that the numbers are much more ambiguous than we readily acknowledge. Disparate outcomes from an equal process can arise for all sorts of reasons separate from intentional bias and discrimination. The Uber research is just a very clean and compelling example.

The 1970s feminists wanted to prove that middle class, educated, white women were experiencing systemic and compelling discrimination from the patriarchy. Because the visual evidence was not compelling, they had to find it in the numbers. But in order to identify the actual scope of discrimination, you have to exclude obvious causal elements for which there might be a difference. What these might be depends on the subject but typical causal factors which lead to disparate outcomes include: education attainment, quality of education, portfolio of skills, years in practice, country of birth, religious orientation, culture, socioeconomic status, genetic heritage, height, weight, morbidity, year of birth (separate from age), epoch, foreign languages spoken, continuity of work, month of birth, etc.

Just because there are disparate outcomes among different groups from the same equal process is no evidence that the system is biased. In fact, it is very difficult to find processes where differential outcomes are demonstrably due only, or even primarily, to bias and discrimination. Some of those instances do exist but there are by far greater numbers of instances of disparate impact owing to other non-discriminatory causal factors.

But I suspect our 1960s legislation which addressed real prejudice and discrimination also predisposed us to see all disparate outcomes as necessarily a result of prejudice and discrimination. And that error took us on a decades-long fruitless journey in search of disparate outcomes as ipso facto evidence of bias and discrimination.

Our first error then is to do systems level analysis seeking disparate outcomes and mistaking those disparate outcomes as evidence for prejudice and discrimination.

The second error is to not distinguish what is happening at a system level from what might be happening at a tactical level.
My point in the earlier post is that in a large system, there will always be plenty of anecdotal (but real) instances of prejudice and discrimination because there are in fact plenty of such incidences. We are humans with evolutionary in-group biases (however we define the in-group) which are only mitigated by by choice and through self-awareness and self-control.

While it is easy to see, when we are looking for them, instances of particular types of discrimination, what is harder to see is that there are usually an equal number of countervailing incidents discrimination. Because we are seeking acts of discrimination against one group, we don't look for those very same acts against another group. Because we don't look for them, we don't see them. We commit the error of confirmation bias and see patterns of selected instances of discrimination and do not take into account the fact that the system is random and we ignore the countervailing instances of discrimination.

A third error is that there are also plenty of instances where there is prejudice at an individual level but that prejudice never translates into discrimination. E.g. a manager has a native aversion to fat people and yet one of his employees is obese and the fair-minded manager conscientiously seeks to ensure that his native aversion is held in check and that he treats the employee fairly.

There are also plenty of instances where discrimination occurs without prejudice. In fact, there is a great deal of this. Your population of warehouse workers may end up being 80% male, not because you are actively prejudiced against women but because the job selects for strength and men have more upper body strength (on average).

It should be clear that proving bias and intentional discrimination is very difficult. You have to have system transparency, clarity of causal direction and causal flow, measurement of elements which are difficult to measure (frame of mind, intent, etc.). Discrimination does exist and it can be proven, but usually it is rare and most instances of disparate outcomes are for perfectly valid reasons.

If the system level is not generating differentials, we can say that the system is fair and be done with it, even though we know that there are some, or even many, individual random incidents of prejudice and discrimination. The random instances of discrimination cancel out, therefore the system is fair.

That is not a philosophically irrational position to take. Not necessarily the right position to take, but there are some clear arguments that that is a sufficiently fair system to not warrant intervention.

But why not work to drive out random instances of discrimination at the tactical level? That would perhaps be ideal. But there is a cost to removing all instances of variance (and not just ensuring that instances of discrimination are random and self-cancelling.) There is a strategic trade-off. Every action entails cost and benefit trade-offs.

There is an analogy to manufacturing processes. From 1970-2000 American manufacturing companies, facing new global competition, desperately tackled their cost and quality issues with TQM, Six Sigma and other such strategies. High variance in the manufacturing processes had led to higher costs. With quality approaches, they drove out the variance and drove down the costs. From a financial and management perspective, reducing internal process variances was not only desirable; it was necessary.

But there are consequences to such approaches. For example, just-in-time lean manufacturing makes you more tactically efficient but also much more strategically susceptible to logistics disruptions. So not only do you have to reduce tactical variance in the manufacturing process, but you also have to find ways to make the system more adaptive and robust.

It goes to the issue that in complex processes, the more reliable you make the system (reduced variance) the more fragile it often tends to become (subject to catastrophic exogenous events.) Loose systems have the variance that allow them to strategically adapt and evolve with changing circumstances whereas tight systems tend to optimize tactical efficiency over strategic effectiveness. The tactical efficiency and strategic effectiveness curves are both subject to exogenous changes and therefore the curves are constantly shifting.

It is likely social systems function this way as well. If our aggregate system has no disparate outcome based on bias and discrimination but we suspect that there is randomized variance at the individual level, then you have to tackle the individual problems.

And that is much more difficult. It requires a precision of measurement and comprehensiveness of measurement which are difficult and oftentimes impossible to achieve. As an example, take the scenario where a woman is introduced into a work crew formerly constituted entirely of men. Assume that the original work crew conversation is characterized by crude language and off-color joking. Let's say, for the sake of argument, the men continue those practices. Here is the measurement challenge based on definitions. How can we know what their intent is? It is possible that the continuance of their past verbal practices is a mark of acceptance. "We acknowledge you as one of us by fully accepting you into our existing fraternity." On the other hand, it is just as plausible that language use is a strategy for exclusion. "We do not accept you and are unwilling to change our ways which may be unpleasant/uncomfortable for you."

Which is it? Human communication and social practices are incredibly sophisticated and nuanced. They are hard to measure and interpret. We are beginning to ask about intent over demonstrated action.

This phenomenon of ambiguity is what we are seeing with much of the current #Metoo movement. We can all accept that there is a legitimate issue to be addressed. Certainly no room in the workplace for assault. And certainly there should be no coercive/retaliatory behavior. But what are the definitions and where are the lines? Does a hug from a germophobe who will not shake anyone's hands represent a sign of welcome or is it an inappropriate advance.

Another part of the issue is that in such a multicultural society, it is hard to pin down social norms. But more than that, it is extremely difficult to maintain sufficient legibility/visibility in order to be able to effectively monitor and track all possible instances of inappropriate variance. We can build systems to do so but that then creates different issues to do with desirable variance, privacy, and stability.

To shift the example, look at driving and traffic. If we wish to drive out all variance associated with running red lights and speeding, we can do that with a mix of increased police presence, technology monitoring (speed detectors, license plate readers, etc.), and technology standards (vehicle monitoring systems recording their own speed and operating circumstances, to which police can have access and which can be used against the driver.) If we wish to remove all inappropriate driving behaviors, we can do so. But it is expensive, it is intrusive, and it makes the strategic system much more fragile.

In the current state, your tactical risks are from other individual drivers and their inappropriate driving behaviors. In a system that drives out all that variance you have reduced your tactical risks while increasing your strategic risks. The state now has complete visibility into your behaviors, you have no privacy, and you are exposed to state malfeasance. You are making an inherent trade-off between frequent distributed risks from other drivers versus rare but concentrated and comprehensive risks from the state. These are the type of trade-offs which challenge us.

In human systems, you will get three different outcomes.
The system is neutral (all laws apply equally to everyone) at both the aggregate and the individual levels. With this model everyone might be treated equally and with no discrimination but there will be variance in outcomes due to differences in individual human capabilities, interests, priorities, etc.

The system is neutral at the aggregate level but there are individual instances of discrimination which are randomly distributed and cancel each other out. There are still differential outcomes but some proportion of the system participants experience random instances of discrimination. It is still in some sense fair in that everyone suffers the same odds of discrimination.

The system is indeed discriminatory from top to bottom, generating disparate outcomes, not due to personal capabilities and preferences but because of manifested bias.
Importantly, all three systems generate disparate impacts even though only one system has a demonstrated and consistent bias.

Our advocacy groups are still working with mental models that evolved almost by chance out of the experience of successful civil rights legislation of the mid-1960s. That mental model is: IF disparate outcomes THEN active bias and discrimination.

That is not a reliably useful mental model. It can be true, and is sometimes true. But not often. Usually there are far more things going on (valid individual preferences and capabilities) and incidental but random discrimination. The challenge in the latter case is that the cure, intrusive control to reduce tactical risk, comes at the cost of making the system more fragile and incurring greater strategic risks.

I am sure the complexity of the above could be articulated more clearly, but at least I have it captured. I think there is a useful insight which confirms that good intentions go astray and that the risk of unintended consequences can be exceptionally high.
1) Pure systems with no tactical or strategic variance (discrimination) exist and have disparate impact based on individual preferences and capabilities.

2) Mixed systems which ensure minimal aggregate variance are common but such systems frequently have individual, but random, instances of tactical variance (discrimination.)

3) Dysfunctional systems which manifest both strategic system variance and tactical individual variance (non-random discrimination at the system and the individual level) are rare.

Distinguishing System 1, 2, and 3 from one another is exceptionally difficult to do simply by focusing in disparate outcomes. All three will reliably demonstrate disparate outcomes, even without discrimination.

No comments:

Post a Comment