Monday, March 2, 2015

Algorithms, humans and faulty root cause analysis

From Algorithm Aversion by Alex Tabarrok. The abstract to the paper he is highlighting is:
Research shows that evidence-based algorithms more accurately predict the future than do human forecasters. Yet, when forecasters are deciding whether to use a human forecaster or a statistical algorithm, they often choose the human forecaster. This phenomenon, which we call algorithm aversion, is costly, and it is important to understand its causes. We show that people are especially averse to algorithmic forecasters after seeing them perform, even when they see them outperform a human forecaster. This is because people more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake. In five studies, participants either saw an algorithm make forecasts, a human make forecasts, both, or neither. They then decided whether to tie their incentives to the future predictions of the algorithm or the human. Participants who saw the algorithm perform were less confident in it, and less likely to choose it over an inferior human forecaster. This was true even among those who saw the algorithm outperform the human.
Reading only the abstract, I have concerns with the conclusion drawn.

The first concern is that the researchers fail to mention in the abstract the effect size. Are we talking about a 1% difference in choice between algorithmic versus human or a 70% difference. When researchers fail to report effect size, my first inclination is to assume that the effect size is small, possibly even within the margin of error.

The other concern is whether the researchers have properly taken into account the human's risk context. If I am an observer and see an algorithm make a routine mistake in one context, I may be concerned about its capacity in a different context.

In other words, when making forecasts, you are dealing with uncertainty. The question becomes what are the standard deviations for the data? If they are very broad, then the subsequent consideration is whether the algorithm is tuned for the full range of variability or only a small range. The concern is magnified by seeing that the algorithm is not self-learning. It will make the mistakes that it is programmed to make. So if there is a black swan circumstance out there beyond the range of variance for which the algorithm is tuned, then there is the risk that the algorithm will fail to properly account for that black swan status and make a catastrophically bad forecast.

Under this speculation, humans aren't biased against algorithms per se, they are biased against risk and they are discounting the algorithm's capacity to function in a broad risk environment. They are anticipating that the human would recognize black swans and risk mitigate where an algorithm would blindly proceed.

How could this alternate hypothesis be tested. I suspect it would be as simple as offering a guaranty. If you present people with two returns of 5% for the algorithm and 3% for the human, what they are currently finding is that people choose the lower, human result and are ascribing that outcome to simple algorithmic prejudice.

But if you guaranty those returns, then you are asking people to lock in a poorer human performance. My speculation is that with a guaranty, virtually everyone would take the higher return. The lesson I would draw from that is that they are not prejudiced towards algorithms per se. They are doubtful about the algorithms capacity to work under a broad scenario of uncertainty.

No comments:

Post a Comment