Ferguson was the modeler at the Imperial College of London whose model forecast large mortality rates in the UK and which were early on in the crisis used to shape British Covid-19 policy. Subsequently, people were reminded that Ferguson's model forecasting record was checkered. Later yet, he and his married mistress broke British quarantine laws in order to continue their assignations, leading to his resignation from the Covid-19 task force.
His model has begun to receive a lot of diagnostic attention to discover why and where the exaggerated forecasts were arising. In many ways, this seems a replication of the East Anglia University affair at the height of the Anthropogenic Global Warming charade. In 2009, what became known as Climategate was initiated by a hack or leak of emails from EAU revealing many questionable assertions, practices, and positions. The hack/leak led to multiple inquiries which officially found nothing amiss but the reading public could not help but notice questionable professional behavior and a distinct absence of rigor and professionalism.
The most striking exchange I read was the emails of a programmer who had been contracted by EAU to review the model. His increasingly despairing plaints as he discovered more and more errors, broken feeds, formulae discrepancies, undocumented model changes, etc. are familiar to anyone with a background in systems integration. It seemed obvious that the whole climate unit had a low proficiency in managing data integrity and model management. Way below anything acceptable in the business world.
Reading Denim's observations of a review of Ferguson's model is a similar experience. Regardless of personal traits of Ferguson (arrogance, attention hound, etc.), regardless of whether he harbored any ideological biases, regardless of his intelligence, and regardless of his academic reputation, he was a poor modeler.
The whole article is interesting but these last three observations seem especially pertinent.
An average of wrong is wrong. There appears to be a seriously concerning issue with how British universities are teaching programming to scientists. Some of them seem to think hardware-triggered variations don’t matter if you average the outputs (they apparently call this an “ensemble model”).Like everything in our current reasonably comprehensive state of unknowing, there is marked debate within the comments as to whether Denim is making useful points. Since many of the criticisms seem to be logical fallacies of one sort or another, it inclines me to believe she is probably closer to correct than otherwise.
Averaging samples to eliminate random noise works only if the noise is actually random. The mishmash of iteratively accumulated floating point uncertainty, uninitialised reads, broken shuffles, broken random number generators and other issues in this model may yield unexpected output changes but they are not truly random deviations, so they can’t just be averaged out. Taking the average of a lot of faulty measurements doesn’t give a correct measurement. And though it would be convenient for the computer industry if it were true, you can’t fix data corruption by averaging.
I’d recommend all scientists writing code in C/C++ read this training material from Intel. It explains how code that works with fractional numbers (floating point) can look deterministic yet end up giving non-reproducible results. It also explains how to fix it.
Processes not people. This is important: the problem here is not really the individuals working on the model. The people in the Imperial team would quickly do a lot better if placed in the context of a well run software company. The problem is the lack of institutional controls and processes. All programmers have written buggy code they aren’t proud of: the difference between ICL and the software industry is the latter has processes to detect and prevent mistakes.
For standards to improve academics must lose the mentality that the rules don’t apply to them. In a formal petition to ICL to retract papers based on the model you can see comments “explaining” that scientists don’t need to unit test their code, that criticising them will just cause them to avoid peer review in future, and other entirely unacceptable positions. Eventually a modeller from the private sector gives them a reality check. In particular academics shouldn’t have to be convinced to open their code to scrutiny; it should be a mandatory part of grant funding.
The deeper question here is whether Imperial College administrators have any institutional awareness of how out of control this department has become, and whether they care. If not, why not? Does the title “Professor at Imperial” mean anything at all, or is the respect it currently garners just groupthink?
Insurance. Someone who works in reinsurance posted an excellent comment in which they claim:
There are private sector epidemiological models that are more accurate than ICL’s.
Despite that they’re still too inaccurate, so they don’t use them.
“We always use 2 different internal models plus for major decisions an external, independent view normally from a broker. It’s unbelievable that a decision of this magnitude was based off a single model“
They conclude by saying “I really wonder why these major multinational model vendors who bring in hundreds of millions in license fees from the insurance industry alone were not consulted during the course of this pandemic.“
A few people criticised the suggestion for epidemiology to be taken over by the insurance industry. They had insults (“mad”, “insane”, “adding 1 and 1 to get 11,000” etc) but no arguments, so they lose that debate by default. Whilst it wouldn’t work in the UK where health insurance hardly matters, in most of the world insurers play a key part in evaluating relative health risks.
But almost regardless, the last three points are useful observations broadly true across many controversies.
See the original article for all the links omitted here.
No comments:
Post a Comment