In crises like the current one, all governments claim to do “what the experts say.” Alternatively, it’s not what the experts say but what “the science says.” Unfortunately, experts frequently disagree and “science” doesn’t even have an opinion.
More worryingly, experts are often wrong. I recently read Tetlock’s excellent book Expert Political Judgment. It turns out that many political experts cannot outperform the proverbial dart-throwing chimp. Even the best human forecasters had a hard time outperforming sophisticated algorithms.
Now, geopolitical forecasts might be particularly hard. Unfortunately, this problem is not limited to political forecasts. Roger Cooke has developed a method of expert elicitation called the “classical model”, where experts are validated using calibration questions from their field of expertise. It appears to be most commonly used in the field of environmental risk assessment. In his review of the method, Cooke finds that most of the experts surveyed cannot make well-calibrated forecasts. Only “eighty-seven of the  experts [27%] have statistical accuracy [calibration] greater than 0.05, which means we can reject the hypothesis that the other 233 experts [73%] provided statistically accurate assessments. More than half of the experts have statistical accuracy that is less than 0.005, and about one-third have scores of less than 0.0001.” That is bad.
Not close enough to disease modeling just yet? In this expert survey, 18 infectious disease modeling researchers were asked on March 16-17 to estimate the number of COVID-19 cases in the United States one week later. They predicted 10,567 cases (80% uncertainty interval: 7,061-24,180 cases) will be reported by COVID Tracker on March 23. The actual number was 42,152. That’s far outside even the 80% confidence interval. That’s also shocking.
So who are the experts on COVID-19? Based on later research by Tetlock, we would be well advised to trust (super)forecasters with a validated track record, even if they are not subject-matter experts (here are their forecasts on COVID-19). Alternatively, we should trust forecasting platforms, which utilize the wisdom of the crowd (Metaculus on COVID-19, Good Judgment Open on COVID-19). If we are set on using experts, we should at least ask them some validation questions before taking them seriously. As Cooke says: “These results suggest that identifying the top experts in each study and relying on their judgments would be better than relying on expert judgments that have not been validated.”
To receive future posts automatically, you can subscribe below.
What makes you think that aggregated forecasts perform well enough to be useful?
Apparently, the Metaculus (averaged) prediction on how many COVID cases there would be in April 27 has increased by ~23% per day since end of March. This is roughly as fast as the case growth rate of ~1.25/d and sounds to me as if the predictions are just catching up to the real numbers and have been underestimating the actual impact for most of the time, just as the experts have been doing. I speculate this is because people are not good at forecasting exponential processes and it doesn’t matter much, how much averaging/aggregation you’re doing.
I mean, experts are often useless and it’s good we are calibrating them. But uncritically believing in aggregation doesn’t seem to be any good.
In my view the best approach is to combine both epidemic modeling and forecasting – the problem with Metaculus forecasts is they are forecasting two different things – how the virus spreads, and what humans will do. However, for policymaking, the important question is ‘if we do this, what will happen’ as opposed to a prediction aggregating ‘you will do this and the consequences will be’. Hence, epidemicforecasting.org