Book review: The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives by Stephen Ziliak and Deirdre McCloskey.
This book provides strong arguments that scientists often use tests of statistical significance as a ritual that substitutes for thought about how hypotheses should be tested.
Some of the practices they criticize are clearly foolish, such as treating data which fall slightly short of providing statistically significant evidence for a hypothesis as reason for concluding the hypothesis is false. But for other practices they attack, it’s unclear whether we can expect scientists to be reasonable enough to do better.
Much of the book is a history of how this situation arose. That might be valuable if it provided insights into what rules could have prevented the problems, but it is mainly devoted to identifying heroes and villains. It seems strange that economists would pay so little attention to incentives that might be responsible.
Instead of blaming the problems primarily on one influential man (R.A. Fisher), I’d suggest asking what distinguishes the areas of science where the problems are common from those where it is largely absent. It appears that the problems are worst in areas where acquiring additional data is hard and where powerful interest groups might benefit from false conclusions. Which leads me to wonder whether scientists are reacting to a risk that they’ll be perceived as agents of drug companies, political parties, etc.
The book sometimes mentions anti-commercial attitudes among the villains, but fails to ask whether that might be a symptom of a desire for “pure” science that is divorced from real world interests. Such a desire might cause many of the beliefs that the authors are fighting.
The book does not adequately address concerns that if scientists in those fields abandon easily applied rules, scientists are sufficiently vulnerable to corruption that we’d end up with less accurate conclusions.
The authors claim the problems have been getting worse, and show some measures by which that seems true. But I suspect their measures fail to capture some improvement that has been happening as the increasing pressure to follow the ritual has caused papers that would previously have been purely qualitative to use quantitative tests that reject the worst ideas.
The book seems somewhat sloppy in its analysis of specific examples. When interpreting data from a study where scientists decided there was no effect because the evidence fell somewhat short of statistical significance, it claims the data show “St. John’s-wort is on average twice as helpful as the placebo”. But the data would provide evidence for that only if there were data showing that the remission rate with no treatment was zero. It’s likely that some or all of the alleged placebo effect was due to effects that are unrelated to treatment. And their use of the word “show” suggests stronger evidence than is provided by the data.
I’ll close with two quotes that I liked from the book:
The goal of an empirical economist should not be to determine the truthfulness of a model but rather the domain of its usefulness – Edward E. Leamer
The probability that an experimental design will be replicated becomes very small once such an experiment appears in print. – Thomas D. Sterling