Book review: Error and the Growth of Experimental Knowledge by Deborah Mayo.
This book provides a fairly thoughtful theory of how scientists work, drawing on
Popper and Kuhn while improving on them. It also tries to describe a quasi-frequentist philosophy (called Error Statistics, abbreviated as ES) which poses a more serious challenge to the Bayesian Way than I’d seen before.
Mayo’s attacks on Bayesians are focused more on subjective Bayesians than objective Bayesians, and they show some real problems with the subjectivists willingness to treat arbitrary priors as valid. The criticisms that apply to objective Bayesians (such as E.T. Jaynes) helped me understand why frequentism is taken seriously, but didn’t convince me to change my view that the Bayesian interpretation is more rigorous than the alternatives.
Mayo shows that much of the disagreement stems from differing goals. ES is designed for scientists whose main job is generating better evidence via new experiments. ES uses statistics for generating severe tests of hypotheses. Bayesians take evidence as a given and don’t think experiments deserve special status within probability theory.
The most important difference between these two philosophies is how they treat experiments with “stopping rules” (e.g. tossing a coin until it produces a pre-specified pattern instead of doing a pre-specified number of tosses). Each philosophy tells us to analyze the results in ways that seem bizarre to people who only understand the other philosophy. This subject is sufficiently confusing that I’ll write a separate post about it after reading other discussions of it.
She constructs a superficially serious disagreement where Bayesians say that evidence increases the probability of a hypothesis while ES says the evidence provides no support for the (Gellerized) hypothesis. Objective Bayesians seem to handle this via priors which reflect the use of old evidence. Marcus Hutter has a description of a general solution in his paper On Universal Prediction and Bayesian Confirmation, but I’m concerned that Bayesians may be more prone to mistakes in implementing such an approach than people who use ES.
Mayo occasionally dismisses the Bayesian Way as wrong due to what look to me like differing uses of concepts such as evidence. The Bayesian notion of very weak evidence seems wrong given her assumption that concept scientific evidence is the “right” concept. This kind of confusion makes me wish Bayesians had invented a different word for the non-prior information that gets fed into Bayes Theorem.
One interesting and apparently valid criticism Mayo makes is that Bayesians treat the evidence that they feed into Bayes Theorem as if it had a probability of one, contrary to the usual Bayesian mantra that all data have a probability and the use of zero or one as a probability is suspect. This is clearly just an approximation for ease of use. Does it cause problems in practice? I haven’t seen a good answer to this.
Mayo claims that ES can apportion blame for an anomalous test result (does it disprove the hypothesis? or did an instrument malfunction?) without dealing with prior probabilities. For example, in the classic 1919 eclipse test of relativity, supporters of Newton’s theory agreed with supporters of relativity about which data to accept and which to reject, whereas Bayesians would have disagreed about the probabilities to assign to the evidence. If I understand her correctly, this also means that if the data had shown light being deflected at a 90 degree angle to what both theories predict, ES scientists wouldn’t look any harder for instrument malfunctions.
Mayo complains that when different experimenters reach different conclusions (due to differing experimental results) “Lindley says all the information resides in an agent’s posterior probability”. This may be true in the unrealistic case where each one perfectly incorporates all relevant evidence into their priors. But a much better Bayesian way to handle differing experimental results is to find all the information created by experiments in the likelihood ratios that they produce.
Many of the disagreements could be resolved by observing which approach to statistics produced better results. The best Mayo can do seems to be when she mentions an obscure claim by Pierce that Bayesian methods had a consistently poor track record in (19th century?) archaeology. I’m disappointed that I haven’t seen a good comparison of more recent uses of the competing approaches.