Book review: Noise: A Flaw in Human Judgment, by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein.
Doctors are more willing to order a test for patients they see in the morning than for those they see late in the day.
Asylum applicants chances of prevailing may be as low as 5% or as high as 88% purely due to which judge hears their case.
Clouds Make Nerds Look Good, in the sense that university admissions officers give higher weight to academic attributes on cloudy days.
These are examples of what the authors describe as an important and neglected problem.
A more precise description of the book’s topic is variations in judgment, with judgment defined as “measurement in which the instrument is a human mind”.
Neglected?
They say noise has been neglected. They have no trouble providing examples which fit that pattern.
Yet I’m able to find examples of noise getting possibly undue attention:
- the scientific establishment’s obsession with p-values, whose effects are mainly to decide whether a study has adequately avoided harm from variation in measurement. A lot of that is about handling variation in mechanical measurement, but there are plenty of studies that include some human judgments that are subjected to p-values.
- the push for national standardized testing in schools.
These don’t frame noise as a distinct cause of harm, so they don’t quite fit the book’s stereotype of what’s neglected, but they weaken the claim of neglect.
The book is at least partly right, in that reducing noise seems to be often more tractable than reducing other errors. The paper Bias, Information, Noise: The BIN Model of Forecasting provides evidence that even when people focus on reducing bias, the main benefits come from reducing noise.
Underestimated?
The authors provide clear examples of people underestimating the magnitude of noise. E.g. an insurance company that was oblivious to the wide disparities between agents about what price to quote for a policy.
Fingerprint identification is another example. Fingerprint experts make mistakes, and some of those mistakes could be avoided by following the book’s advice. But mostly those experts do a good job of minimizing the harm caused by those mistakes, with one glaring exception: until recently, experts claimed the identifications were infallible.
In contrast, the book How to Measure Anything seems to focus on an important category of situations where people overestimate noise, using it as an excuse to avoid measuring something.
So I wish they’d asked more questions about when it’s underestimated.
Is it only underestimated when people have a high opinion of the judge?
Is noise underestimated outside of WEIRD cultures? Is it a function of belief in human superiority?
Choosing the Best Categories
Organizations all over the world see bias as a villain. They are right. They do not see noise that way. They should.
Is noise, as they’ve defined it, a category that deserves much attention? Would it be better to focus on error reduction in general?
The scientific establishment has accomplished a good deal by framing their goals as minimizing errors. When that has gone wrong, it’s mostly been by Goodharting on a specific metric, not from confusion about whether to prioritize attention to bias, noise, or something else.
Tetlock has also accomplished some important error reduction that includes a good deal of noise reduction, without focusing on noise specifically. In fact, parts of this book are good because they generalize the ideas in Tetlock’s work to other domains.
In sum, they failed to convince me that it’s valuable to frame noise as a special villain.
Should I Use their Strategies?
Are these ideas of much use to me? Yes, in the sense that I’ve already improved my stock market performance a good deal by inferring what rules I’m using, partially automating those rules, then noticing how I can improve both those rules and my intuitive decisions. Some of that improvement came from noticing biases. Some came from realizing that I wasn’t wise enough to find good complex rules.
The book mostly consists of improved packaging of ideas that have been around for 50 years, so it’s not like much of the advice is new to me.
Many of the ideas in the book require multiple people in order to work. Noise reduction is not important enough for me to abandon my practice of working alone.
I’ve adopted enough of the other advice that I’ve reached diminishing returns, so I don’t see … wait a minute, what about the advice to rank choices via pairwise comparisons, instead of rating the choices? I sometimes do that implicitly, but I never thought about doing it explicitly. It feels unnatural, counterintuitive, and likely hard, but I definitely don’t have the kind of justification I’d need to reject this advice before trying it. That seems like something I might be resisting due to pride in my cognitive abilities.
Political Issues
The book is cautious about exploring the political ramifications of their ideas. I’ll try to be slightly bolder.
One of their better complaints is that courts prevent juries from hearing what punitive damages have been awarded in similar cases, presumably to prevent juries from anchoring on those numbers. The authors are pretty sure that just results in juries anchoring on more arbitrary numbers, making the awards more arbitrary for no obvious benefit.
The authors argue that it’s often good to replace human judgment with simple automated rules that are inferred from observing what human judges are trying to do. “The whole trick is to decide what variables to look at and then to know how to add.” (from Linear models in decision making).
There are plenty of places where this advice could be usefully applied if people cared more about reducing noise. E.g. someone could observe what variables judges are trying to use in choosing what sentences to impose, then generate a rule for how to combine those variables in an automated way that bypasses the human judgment.
Awareness that noise is often underestimated should be applied more forcefully to our awareness of noise in choosing what rules to adopt. E.g. we should remind ourselves of examples such as people in Oregon being presumed unable to safely pump their own gas. Sure, there are some special interests involved, but there are also plenty of voters who simply commit to arbitrary beliefs for no good reason.
Counterarguments
The book lists a number of objections to noise reduction strategies, and is maybe a bit too polite about replying to them.
The most important class of objections involves the costs of reducing noise. If the costs of noise were well quantified, I’d expect these objections to start sounding relatively petty.
But judgments about the costs of noise are disturbingly noisy, so we give somewhat high priority to applying noise reduction strategies to noise cost judgments. Maybe we should even apply noise reduction strategies to judgments about when to prioritize applying noise reduction strategies. But please don’t go beyond two levels of meta.
The other main class of objections seems to revolve around dignity. It’s an affront to my dignity to say that I can’t meaningfully rate stocks into more than seven plus or minus two categories as reliably as I can rank them via pairwise comparisons. It’s an affront to doctor’s dignity to distrust their judgment about when to wash their hands. It’s an affront to the dignity of a defendant to say that courts should ignore all the help they’ve given to homeless puppies.
I don’t mean to imply that dignity has no value. I mainly want to ridicule the idea that human dignity is priceless.
Conclusion
This is a moderately valuable book, whose lessons will likely be neglected. Some of its claims seem overstated, but in relatively harmless ways.