Book review: The Measure of All Minds: Evaluating Natural and Artificial Intelligence, by José Hernández-Orallo.
Much of this book consists of surveys of the psychometric literature. But the best parts of the book involve original results that bring more rigor and generality to the field. The best parts of the book approach the quality that I saw in Judea Pearl’s Causality, and E.T. Jaynes’ Probability Theory, but Measure of All Minds achieves a smaller fraction of its author’s ambitions, and is sometimes poorly focused.
Hernández-Orallo has an impressive ambition: measure intelligence for any agent. The book mentions a wide variety of agents, such as normal humans, infants, deaf-blind humans, human teams, dogs, bacteria, Q-learning algorithms, etc.
The book is aimed at a narrow and fairly unusual target audience. Much of it reads like it’s directed at psychology researchers, but the more original parts of the book require thinking like a mathematician.
The survey part seems pretty comprehensive, but I wasn’t satisfied with his ability to distinguish the valuable parts (although he did a good job of ignoring the politicized rants that plague many discussions of this subject).
For nearly the first 200 pages of the book, I was mostly wondering whether the book would address anything important enough for me to want to read to the end. Then I reached an impressive part: a description of an objective IQ-like measure. Hernández-Orallo offers a test (called the C-test) which:
- measures a well-defined concept: sequential inductive inference,
- defines the correct responses using an objective rule (based on Kolmogorov complexity),
- with essentially no arbitrary cultural bias (the main feature that looks like an arbitrary cultural bias is the choice of alphabet and its order),
- and gives results in objective units (based on Levin’s Kt).
Yet just when I got my hopes up for a major improvement in real-world IQ testing, he points out that what the C-test measures is too narrow to be called intelligence: there’s a 960 line Perl program that exhibits human-level performance on this kind of test, without resembling a breakthrough in AI.
The book provides rigorous analysis of some aspects of the AI foom debate. At least one subset of intelligence requires exponentially increasing resources for linear gains in an objective measure of that subset of intelligence. His overall claim is obviously too narrow to say anything conclusive about foom. The argument made me somewhat more confident that foom is only plausible in the presence of significant hardware overhang. But the book also convinced me that hardware overhang is more likely than I previously believed.
His measures seem sufficiently IQ-like that I believe his argument implies that exponentially increasing resources are generally needed for linear increases in IQ. Unfortunately, linear gains in IQ don’t imply anything simple about gains in ability at world domination.
Hernández-Orallo complains about Bostrom’s lack of rigor in Superintelligence. Most authors would be making selective demands for rigor if they made that complaint, but Hernández-Orallo seems fairly consistent about wanting to focus on rigorous analysis, even when that implies not tackling the most important problems we face.
He mentions that compression is similar to intelligence, but claims it isn’t an adequate measure of intelligence. His only clear counter-example is that some algorithms produce better lossless compression than humans, without being more intelligent than humans. I find that unconvincing: humans don’t try to do lossless compression as well as they do lossy compression, so it’s hard to evaluate human compression abilities in ways that are both fair and rigorous. I have vague hopes that someone can find a good way to objectively measure human compression abilities, but it’s unclear whether that would be worth the effort.
The farther Hernández-Orallo strays from his areas of expertise (which isn’t often), the less convincing he sounds. For example, he suggests that robot waiters will learn how to get higher tips from customers. That assumes a rather strange model of why tipping happens; I predict that tipping will be obsolete in restaurants without human waiters.
Some other interesting ideas that I’m too busy to discuss now:
- He conjectures that, contrary to Spearman’s law of diminishing returns, minds in general (i.e. typical AIs) will show a higher g-factor as their intelligence increases.
- Graphs of crowd IQ as a function of crowd size.
- This paper claims that
High general intelligence has independently evolved at least four times, with convergent evolution in capuchins, baboons, macaques and great apes.
The book clarified my knowledge of intelligence testing, and slightly improved my understanding of what intelligence is. It’s essential reading for a rather narrow set of people.
 – I use the term “arbitrary” there to distinguish undesirable biases from the cultural biases toward abstract thought (which belong in IQ tests), and possible biases toward test subjects who are motivated to score well (that’s a messy subject, and mostly beyond the scope of the book).