Science and Technology

Book review: The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives by Stephen Ziliak and Deirdre McCloskey.
This book provides strong arguments that scientists often use tests of statistical significance as a ritual that substitutes for thought about how hypotheses should be tested.
Some of the practices they criticize are clearly foolish, such as treating data which fall slightly short of providing statistically significant evidence for a hypothesis as reason for concluding the hypothesis is false. But for other practices they attack, it’s unclear whether we can expect scientists to be reasonable enough to do better.
Much of the book is a history of how this situation arose. That might be valuable if it provided insights into what rules could have prevented the problems, but it is mainly devoted to identifying heroes and villains. It seems strange that economists would pay so little attention to incentives that might be responsible.
Instead of blaming the problems primarily on one influential man (R.A. Fisher), I’d suggest asking what distinguishes the areas of science where the problems are common from those where it is largely absent. It appears that the problems are worst in areas where acquiring additional data is hard and where powerful interest groups might benefit from false conclusions. Which leads me to wonder whether scientists are reacting to a risk that they’ll be perceived as agents of drug companies, political parties, etc.
The book sometimes mentions anti-commercial attitudes among the villains, but fails to ask whether that might be a symptom of a desire for “pure” science that is divorced from real world interests. Such a desire might cause many of the beliefs that the authors are fighting.
The book does not adequately address concerns that if scientists in those fields abandon easily applied rules, scientists are sufficiently vulnerable to corruption that we’d end up with less accurate conclusions.
The authors claim the problems have been getting worse, and show some measures by which that seems true. But I suspect their measures fail to capture some improvement that has been happening as the increasing pressure to follow the ritual has caused papers that would previously have been purely qualitative to use quantitative tests that reject the worst ideas.
The book seems somewhat sloppy in its analysis of specific examples. When interpreting data from a study where scientists decided there was no effect because the evidence fell somewhat short of statistical significance, it claims the data show “St. John’s-wort is on average twice as helpful as the placebo”. But the data would provide evidence for that only if there were data showing that the remission rate with no treatment was zero. It’s likely that some or all of the alleged placebo effect was due to effects that are unrelated to treatment. And their use of the word “show” suggests stronger evidence than is provided by the data.
I’ll close with two quotes that I liked from the book:

The goal of an empirical economist should not be to determine the truthfulness of a model but rather the domain of its usefulness – Edward E. Leamer

The probability that an experimental design will be replicated becomes very small once such an experiment appears in print. – Thomas D. Sterling

Yet another hypothesis for why the industrial revolution happened in Europe is that higher infectious disease levels elsewhere caused most cultures that might have produced technological development were more collectivist in order to reduce the spread of disease.
Collectivism may have inhibited scientific and technological innovation by discouraging trial-and-error learning and ideas which signal an absence of group loyalty.

collectivists make sharp distinctions between coalitional in-groups and out-groups, whereas among individualists the in-group/out-group distinction is typically weaker (Gelfand et al. 2004). A consequence is that collectivists are more wary of contact with foreigners

I suspect this effect is real but not strong enough to be the primary cause of the industrial revolution. It does, however, provide a good clue about why a relatively tropical region such as the Yangtze River Delta lagged behind more temperate England.

Steve Omohundro has recently written a paper and given a talk (a video should become available soon) on AI ethics with arguments whose most important concerns resemble Eliezer Yudkowsky’s. I find Steve’s style more organized and more likely to convince mainstream researchers than Eliezer’s best attempt so far.
Steve avoids Eliezer’s suspicious claims about how fast AI will take off, and phrases his arguments in ways that are largely independent of the takeoff speed. But a sentence or two in the conclusion of his paper suggests that he is leaning toward solutions which assume multiple AIs will be able to safeguard against a single AI imposing its goals on the world. He doesn’t appear to have a good reason to consider this assumption reliable, but at least he doesn’t show the kind of disturbing certainty that Eliezer has about the first self-improving AI becoming powerful enough to take over the world.
Possibly the most important news in Steve’s talk was his statement that he had largely stopped working to create intelligent software due to his concerns about safely specifying goals for an AI. He indicated that one important insight that contributed to this change of mind came when Carl Shulman pointed out a flaw in Steve’s proposal for a utility function which included a goal of the AI shutting itself off after a specified time (the flaw involves a small chance of physics being different from apparent physics and how the AI will evaluate expected utilities resulting from that improbable physics).

Book review: Seeing Red: A Study in Consciousness (Mind/Brain/Behavior Initiative) by Nicholas Humphrey,
This book provides a clear and simple description of phenomena that are often described as qualia, and a good guess about how and why they might have evolved as convenient ways for one part of a brain to get useful information from other parts. It uses examples of blindsight to clarify the difference between using sensory input and being aware of that input.
I liked the description of consciousness as being “temporally thick” rather than being about an instantaneous “now”, suggesting that it includes pieces of short-term memory and possibly predictions about the next few seconds.
The book won’t stop people from claiming that there’s still something mysterious about qualia, but it will make it hard for them to claim that they have a well-posed question that hasn’t been answered. It avoids most debates over meanings of words by usually sticking to simpler and less controversial words than qualia, and only using the word consciousness in ways that are relatively uncontroversial.
The book is short and readable, yet the important parts of it are concise enough that it could be adequately expressed in a shorter essay.

Book review: The First Word: The Search for the Origins of Language by Christine Kenneally
This book contains a few good ideas, but spends more time than I want discussing the personalities and politics that have been involved in the field.
It presents some good arguments against the “big bang” theory of the origin of human language (which suggests that one mutation may have created syntactic abilities that don’t correspond to anything in other species), mainly by presenting evidence that human language is not a monolithic feature, and that most aspects of it resemble features which can be seen in other species. For example, some of our syntactic ability involves reusing parts of the brain that provide motor control.
I’m uncertain whether the “big bang” theory she argues against is actually believed by any serious scholar, because those who may have advocated it haven’t articulated much of a theory (partly because they think there’s too little evidence to say much about the origin of language).
The most valuable idea I got from the book was the possibility that the development of human language may have been a byproduct of a sophisticated theory of mind. Other apes seem to get less benefit from communications because with only the limited theory of mind that a typical chimp has, there’s little that improved communication by one individual can do to increase cooperation between individuals.

Tim Freeman has a paper which clarifies many of the issues that need to be solved for humans to coexist with a superhuman AI. It comes close to what we would need if we had unlimited computing power. I will try amplify on some of the criticisms of it from the sl4 mailing list.
It errs on the side of our current intuitions about what I consider to be subgoals, rather than trusting the AI’s reasoning to find good subgoals to meet primary human goal(s). Another way to phrase that would be that it fiddles with parameters to get special-case results that fit our intuitions rather than focusing on general purpose solutions that would be more likely to produce good results in conditions that we haven’t yet imagined.
For example, concern about whether the AI pays the grocer seems misplaced. If our current intuitions about property rights continue to be good guidelines for maximizing human utility in a world with a powerful AI, why would that AI not reach that conclusion by inferring human utility functions from observed behavior and modeling the effects of property rights on human utility? If not, then why shouldn’t we accept that the AI has decided on something better than property rights (assuming our other methods of verifying that the AI is optimizing human utility show no flaws)?
Is it because we lack decent methods of verifying the AI’s effects on phenomena such as happiness that are more directly related to our utility functions? If so, it would seem to imply that we have an inadequate understanding of what we mean by maximizing utility. I didn’t see a clear explanation of how the AI would infer utility functions from observing human behavior (maybe the source code, which I haven’t read, clarifies it), but that appears to be roughly how humans at their best make the equivalent moral judgments.
I see similar problems with designing the AI to produce the “correct” result with Pascal’s Wager. Tim says “If Heaven and Hell enter into a decision about buying apples, the outcome seems difficult to predict”. Since humans have a poor track record at thinking rationally about very small probabilities and phenomena such as Heaven that are hard to observe, I wouldn’t expect AI unpredictability in this area to be evidence of a problem. It seems more likely that humans are evaluating Pascal’s Wager incorrectly than that a rational AI which can infer most aspects of human utility functions from human behavior will evaluate it incorrectly.

Book review: The Robot’s Rebellion: Finding Meaning in the Age of Darwin by Keith E. Stanovich.
This book asks us to notice the conflicts between the goals our genes created us to serve and the goals that we as individuals benefit from achieving. Its viewpoint is somewhat new and unique. Little of the substance of the book seemed new, but there were a number of places where the book provides better ways of communicating ideas than I had previously seen.
The title led me to hope that the book would present a very ambitious vision of how we might completely free ourselves from genes and Darwinian evolution, but his advice focuses on modest nearer term benefits we can get from knowledge produced by studying heuristics and biases. The advice consists mainly of elaborations on the ideas of being rational and using scientific methods instead of using gut reactions when those approaches give conflicting results.
He does a good job of describing the conflicts between first order desires (e.g. eating sugar) and higher order desires (e.g. the desire not to desire unhealthy amounts of sugar), and why there’s no easy rule to decide which of those desires deserves priority.
He isn’t entirely fair to groups of people that he disagrees with. I was particularly annoyed by his claim that “economics vehemently resists the notion that first-order desires are subject to critique”. What economics resists is the idea that person X is a better authority than person Y about what Y’s desires are or ought to be. Economics mostly avoids saying anything about whether a person should want to alter his desires, and I expect those issues to be dealt with better by other disciplines.
One of the better ideas in the book was to compare the effort put into testing peoples’ intelligence to the effort devoted to testing their rationality. He mentions many tests that would provide information about how well a person has overcome biases, and points out that such information might be valuable to schools deciding which students to admit and employers deciding whom to hire. I wish he had provided a good analysis of how well those tests would work if people trained to do well on them. I’d expect some wide variations – tests for overconfidence can be made to work fairly well, but I’m concerned that people would learn to pass tests such as the Wason test without changing their behavior under conditions when they’re not alert to these problems.

Book review: What is Intelligence?: Beyond the Flynn Effect by James Flynn
This book may not be the final word on the Flynn Effect, but it makes enough progress in that direction that it is no longer reasonable to describe the Flynn Effect as a mystery. I’m surprised at how much Flynn has changed since the last essay of his I’ve read (a somewhat underwhelming chapter in The Rising Curve (edited by Ulric Neisser)).
Flynn presents evidence of very divergent trends in subsets of IQ tests, and describes a good hypothesis about how that divergence might be explained by increasing cultural pressure for abstract, scientific thought that could create increasing effort to develop certain kinds of cognitive skills that were less important in prior societies.
This helps explain the puzzle of why the Flynn Effect doesn’t imply that 19th century society consisted primarily of retarded people – there has been relatively little change in how people handle concrete problems that constituted the main challenges to average people then. He presents an interesting example of how to observe cognitive differences between modern U.S. society and societies that are very isolated, showing big differences in how they handle some abstract questions.
He also explains why we see very different results for IQ differences over time from what we see when using tests such as twin studies to observe the IQ effects of changes in environment on IQ: the twin studies test unimportant things such as different parenting styles, but don’t test major cultural changes that distinguish the 19th century from today.
None of this suggests that the concept of g is unimportant or refers to something unreal, but a strong focus on g has helped blind some people to the ideas that are needed to understand the Flynn Effect.
Flynn also reports that the rise in IQs is, at least by some measures, fairly uniform across the entire range of IQs (contrary to The Bell Curve’s report that it appeared to affect mainly the low end of the IQ spectrum). This weakens one of the obvious criticisms of David Friedman’s conjecture that modern obstetrics caused the Flynn Effect by reducing the birth related obstacles to large skulls (although if that were the main cause of the Flynn Effect, I’d expect the IQ increase to be largest at the high end of the IQ spectrum).
It also weakens the inference I drew from Fogel’s book on malnutrition. Flynn does little to directly address Fogel’s argument that the benefits of improved nutrition show up with longer delays than most people realize, but he does report some evidence that the Flynn Effect continues even when the height increases that Fogel relies on to measure the benefits of nutrition stop.
Flynn reports that the Flynn Effect has probably stopped in Scandinavia but hasn’t shown signs of stopping in the U.S. His comments on the future of IQ gains are unimpressive.
There are a few disappointing parts of the book near the end where he wanders into political issues where he has relatively little expertise, and his relatively ordinary opinions are no better than a typical academic discussion of politics. In spite of that, the book is fairly short and can be read quickly.
One interesting experiment that Flynn discusses tested whether students preferred one dollar now or two dollars next week. The results were twice as useful in predicting their grades as IQ tests. Flynn infers that this is a test of self control. I presume that is part of what it tests, but I wonder whether it also tests whether the students were able to realize that the testers’ word could be trusted (due to better ability to analyze the relevant incentives? or due to a general willingness to trust strangers because of how the ways they met people selected for trustworthy people?).

Book review: A Farewell to Alms: A Brief Economic History of the World by Gregory Clark
This book provides very interesting descriptions of the Malthusian era, and a surprising explanation of how parts of the world escaped Malthusian conditions starting around 1800. The process involved centuries of wealthier people outreproducing the poor, and passing on traits/culture which were better adapted to modern living. This process almost certainly made some contribution to the industrial revolution, but I can’t find a plausible way to guess the magnitude of the contribution. Clark is not the kind of author I trust to evaluate that magnitude.
His arguments against other explanations of the industrial revolution are unconvincing. His criticisms of institutional explanations imply at most that those explanations are incomplete. But combining those explanations with a normal belief that knowledge/technology matters produces a model against which his criticisms are ineffective. (See Bryan Caplan for more detailed replies about institutional explanations).
He makes interesting claims about how differently we should think about the effects in Malthusian world of phenomena that would be obviously bad today. E.g. he thinks the black plague had good long-term effects. He made me rethink those effects, but he only convinced me that the effects weren’t as bad as commonly believed. His confidence that they were good depends on some unlikely quantitative assumptions about benefits of increased income per capita, and he seems oblivious to the numerous problems with evaluating these assumptions. His comments in the last few pages of the book about how little average happiness has changed over time leads me to doubt that his beliefs are consistent on this subject.
While many parts of the book appear at first glance to be painting a very unpleasant picture of the Malthusian era, he ends up concluding it wasn’t a particularly bad era, and he describes people as being farther from starvation than Robert Fogel indicates in The Escape from Hunger and Premature Death, 1700-2100. Their ability to reach somewhat different conclusions by looking at different sets of evidence implies that there’s more uncertainty than they admit.
He does a neat job of pointing out that economists have often overstated the comparative advantage argument against concerns that labor will be replaced by machines: horses were a clear example of laborers who suffered massive unemployment a century ago when the value of their labor dropped below the cost of their food.