Book review: Superintelligence: Paths, Dangers, Strategies, by Nick Bostrom.
This book is substantially more thoughtful than previous books on AGI risk, and substantially better organized than the previous thoughtful writings on the subject.
Bostrom’s discussion of AGI takeoff speed is disappointingly philosophical. Many sources (most recently CFAR) have told me to rely on the outside view to forecast how long something will take. We’ve got lots of weak evidence about the nature of intelligence, how it evolved, and about how various kinds of software improve, providing data for an outside view. Bostrom assigns a vague but implausibly high probability to AI going from human-equivalent to more powerful than humanity as a whole in days, with little thought of this kind of empirical check.
I’ll discuss this more in a separate post which is more about the general AI foom debate than about this book.
Bostrom’s discussion of how takeoff speed influences the chance of a winner-take-all scenario makes it clear that disagreements over takeoff speed are pretty much the only cause of my disagreement with him over the likelihood of a winner-take-all outcome. Other writers aren’t this clear about this. I suspect those who assign substantial probability to a winner-take-all outcome if takeoff is slow will wish he’d analyzed this in more detail.
I’m less optimistic than Bostrom about monitoring AGI progress. He says “it would not be too difficult to identify most capable individuals with a long-standing interest in [AGI] research”. AGI might require enough expertise for that to be true, but if AGI surprises me by only needing modest new insights, I’m concerned by the precedent of Tim Berners-Lee creating a global hypertext system while barely being noticed by the “leading” researchers in that field. Also, the large number of people who mistakenly think they’ve been making progress on AGI may obscure the competent ones.
He seems confused about the long-term trends in AI researcher beliefs about the risks: “The pioneers of artificial intelligence … mostly did not contemplate the possibility of greater-than-human AI” seems implausible; it’s much more likely they expected it but were either overconfident about it producing good results or fatalistic about preventing bad results (“If we’re lucky, they might decide to keep us as pets” – Marvin Minsky, LIFE Nov 20, 1970).
The best parts of the book clarify many issues related to ensuring that an AGI does what we want.
He catalogs more approaches to controlling AGI than I had previously considered, including tripwires, oracles, and genies, and clearly explains many limits to what they can accomplish.
He briefly mentions the risk that the operator of an oracle AI would misuse it for her personal advantage. Why should we have less concern about the designers of other types of AGI giving them goals that favor the designers?
If an oracle AI can’t produce a result that humans can analyze well enough to decide (without trusting the AI) that it’s safe, why would we expect other approaches (e.g. humans writing the equivalent seed AI directly) to be more feasible?
He covers a wide range of ways we can imagine handling AI goals, including strange ideas such as telling an AGI to use the motivations of superintelligences created by other civilizations
He does a very good job of discussing what values we should and shouldn’t install in an AGI: the best decision theory plus a “do what I mean” dynamic, but not a complete morality.
I’m somewhat concerned by his use of “final goal” without careful explanation. People who anthropomorphise goals are likely to confuse at least the first few references to “final goal” as if it worked like a human goal, i.e. something that the AI might want to modify if it conflicted with other goals.
It’s not clear how much of these chapters depend on a winner-take-all scenario. I get the impression that Bostrom doubts we can do much about the risks associated with scenarios where multiple AGIs become superhuman. This seems strange to me. I want people who write about AGI risks to devote more attention to whether we can influence whether multiple AGIs become a singleton, and how they treat lesser intelligences. Designing AGI to reflect values we want seems almost as desirable in scenarios with multiple AGIs as in the winner-take-all scenario (I’m unsure what Bostrom thinks about that). In a world with many AGIs with unfriendly values, what can humans do to bargain for a habitable niche?
He has a chapter on worlds dominated by whole brain emulations (WBE), probably inspired by Robin Hanson’s writings but with more focus on evaluating risks than on predicting the most probable outcomes. Since it looks like we should still expect an em-dominated world to be replaced at some point by AGI(s) that are designed more cleanly and able to self-improve faster, this isn’t really an alternative to the scenarios discussed in the rest of the book.
He treats starting with “familiar and human-like motivations” (in an augmentation route) as an advantage. Judging from our experience with humans who take over large countries, a human-derived intelligence that conquered the world wouldn’t be safe or friendly, although it would be closer to my goals than a smiley-face maximizer. The main advantage I see in a human-derived superintelligence would be a lower risk of it self-improving fast enough for the frontrunner advantage to be large. But that also means it’s more likely to be eclipsed by a design more amenable to self-improvement.
I’m suspicious of the implication (figure 13) that the risks of WBE will be comparable to AGI risks.
- Is that mainly due to “neuromorphic AI” risks? Bostrom’s description of neuromorphic AI is vague, but my intuition is that human intelligence isn’t flexible enough to easily get the intelligence part of WBE without getting something moderately close to human behavior.
- Is the risk of uploaded chimp(s) important? I have some concerns there, but Bostrom doesn’t mention it.
- How about the risks of competitive pressures driving out human traits (discussed more fully/verbosely at Slate Star Codex)? If WBE and AGI happen close enough together in time that we can plausibly influence which comes first, I don’t expect the time between the two to be long enough for that competition to have large effects.
- The risk that many humans won’t have enough resources to survive? That’s scary, but wouldn’t cause the astronomical waste of extinction.
Also, I don’t accept his assertion that AGI before WBE eliminates the risks of WBE. Some scenarios with multiple independently designed AGIs forming a weakly coordinated singleton (which I consider more likely than Bostrom does) appear to leave the last two risks in that list unresolved.
This books represents progress toward clear thinking about AGI risks, but much more work still needs to be done.
One argument is that, in the case of checking an oracle AI produced a proof of its own safety, we have to worry about intentionally hidden flaws (intentionally hidden by a potentially much smarter being), but in the case of checking a human produced proof of the oracle’s safety, you only have to worry about accidentally hidden flaws (errors).
Pingback: » AGI Foom Bayesian Investor Blog
Pingback: Assorted links
I think that Bostrom considers most of the problems he discusses to remain relevant in cases with many AGI’s, as do I. I think a singleton is not a very likely outcome, but have little trouble finding common ground with him on many of these issues (though there are also some disagreements driven by the distinction).
With respects to risks from emulations, I think Bostrom considers human-like but radically inhuman AI quite likely. I mostly agree. Human brains don’t look that brittle with respect to various parameter changes (you can muck with a brain a shocking amount before breaking it), and we know that in some sense they aren’t that brittle since evolution can improve them relatively quickly. So even in the most brittle case, it seems like we would probably expect an accumulation of small improvements found by brute force search (which may or may not preserve human values). On top of that, I think there is a pretty good chance, certainly not much less than 50-50, that working examples of human brains in silico would relatively quickly allow us to design useful algorithms that were not human but leveraged the same useful principles. I’m less confident concerning whether this is better or worse than the status quo.
John: it seems that both a human and a credit-seeking researcher attempt to produce a maximally compelling argument, with correctness enforced by our ability to notice errors. That is, I don’t think that shared human values are an important aspect of ensuring the correctness of most academic work. Given that, why be more concerned about an AI’s proposals? One reason is that it’s much smarter than you. But (1) we have the ability to throttle how smart an AI is; if humans can solve problem X, then presumably an AI of human-level intelligence can solve problem X and in this case the capacity for manipulation is no greater than usual, and we have other advantages (2) we can leverage machine intelligence to help evaluate proposals as well as to craft them, for example see my post here. Overall it looks like this situation is much better than our current one to me.
Paul, what kind of mucking with brains are you talking about? The kinds of changes caused by damage (e.g. Phineas Gage) or drugs don’t produce results I’d consider inhuman, except maybe when they significantly impair the person’s ability to communicate. The evidence from drugs suggests that the few that improve productivity (caffeine, modafanil) don’t come close to changing a person’s humanity. Generally the more a drug changes personality, the more it reduces productivity, which suggests we’re close enough to a local optimum that improvements to uploaded minds will be more like increased speed or better input/output than value-altering changes. What little I can infer from artificial neural nets doesn’t cause me to change that estimate much.
Are you using a much narrower concept of human than I am?
I’m less confident in my ability to evaluate the effects of knowledge from emulations on approaches such as Jeff Hawkins’.
Drugs and other interventions do have some effects on personality even if they are muted, and if you were able to experiment until you found an improvement and then repeat dozens of times, I would expect to end up with something relatively far from human. We may disagree about empirically how different people are on drugs. I think the important notion of “closeness” is what people would do in the long-term if they acquired resources (very roughly), and I do think that this could be substantially modified over not-too-many steps similar to a lobotomy or stimulant use (though obviously it would take many more steps of the latter kind, since they are much smaller steps).
I don’t know how hard it would be to avoid relevant kinds of drift if you wanted to, my guess would be that it is non-trivial but also not that hard, and probably wouldn’t involve huge productivity losses. So I probably disagree with Bostrom on this front.
I would guess that continued evolution would yield creatures as different again from humans as humans are from chimps, and that even with a crude understanding of development it would be possible to carry out a similar process with brain emulations by experimenting. To the extent that I’m not concerned about this kind of change, I’m similarly unconcerned about chimps.
It also seems like normal processes of indoctrination and selection can lead to fairly large changes in humans (e.g. towards the end of changing their motivation), and again that those processes could potentially occur radically faster for emulations even with a relatively crude understanding of psychology.
Indoctrination is effective at changing tribal affiliation, which has important effects on what people would do if they had lots more resources. That doesn’t seem much like making them less human. I expect the important changes in emulation motives to be like that.
Pingback: The AI Safety Landscape | Bayesian Investor Blog
Pingback: The Measure of All Minds | Bayesian Investor Blog
Pingback: Artificial Intelligence Safety and Security | Bayesian Investor Blog
Pingback: Drexler on AI Risk | Bayesian Investor Blog
Pingback: Human Compatible | Bayesian Investor Blog
Pingback: Deep Utopia | Bayesian Investor Blog