existential risks

All posts tagged existential risks

Book review: Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat.

This book describes the risks that artificial general intelligence will cause human extinction, presenting the ideas propounded by Eliezer Yudkowsky in a slightly more organized but less rigorous style than Eliezer has.

Barrat is insufficiently curious about why many people who claim to be AI experts disagree, so he’ll do little to change the minds of people who already have opinions on the subject.

He dismisses critics as unable or unwilling to think clearly about the arguments. My experience suggests that while it’s normally the case that there’s an argument that any one critic hasn’t paid much attention to, that’s often because they’ve rejected with some thought some other step in Eliezer’s reasoning and concluded that the step they’re ignoring wouldn’t influence their conclusions.

The weakest claim in the book is that an AGI might become superintelligent in hours. A large fraction of people who have worked on AGI (e.g. Eric Baum’s What is Thought?) dismiss this as too improbable to be worth much attention, and Barrat doesn’t offer them any reason to reconsider. The rapid takeoff scenarios influence how plausible it is that the first AGI will take over the world. Barrat seems only interested in talking to readers who can be convinced we’re almost certainly doomed if we don’t build the first AGI right. Why not also pay some attention to the more complex situation where an AGI takes years to become superhuman? Should people who think there’s a 1% chance of the first AGI conquering the world worry about that risk?

Some people don’t approve of trying to build an immutable utility function into an AGI, often pointing to changes in human goals without clearly analyzing whether those are subgoals that are being altered to achieve a stable supergoal/utility function. Barrat mentions one such person, but does little to analyze this disagreement.

Would an AGI that has been designed without careful attention to safety blindly follow a narrow interpretation of its programmed goal(s), or would it (after achieving superintelligence) figure out and follow the intentions of its authors? People seem to jump to whatever conclusion supports their attitude toward AGI risk without much analysis of why others disagree, and Barrat follows that pattern.

I can imagine either possibility. If the easiest way to encode a goal system in an AGI is something like “output chess moves which according to the rules of chess will result in checkmate” (turning the planet into computronium might help satisfy that goal).

An apparently harder approach would have the AGI consult a human arbiter to figure out whether it wins the chess game – “human arbiter” isn’t easy to encode in typical software. But AGI wouldn’t be typical software. It’s not obviously wrong to believe that software smart enough to take over the world would be smart enough to handle hard concepts like that. I’d like to see someone pin down people who think this is the obvious result and get them to explain how they imagine the AGI handling the goal before it reaches human-level intelligence.

He mentions some past events that might provide analogies for how AGI will interact with us, but I’m disappointed by how little thought he puts into this.

His examples of contact between technologically advanced beings and less advanced ones all refer to Europeans contacting Native Americans. I’d like to have seen a wider variety of analogies, e.g.:

  • Japan’s contact with the west after centuries of isolation
  • the interaction between neanderthals and humans
  • the contact that resulted in mitochondria becoming part of our cells

He quotes Vinge saying an AGI ‘would not be humankind’s “tool” – any more than humans are the tools of rabbits or robins or chimpanzees.’ I’d say that humans are sometimes the tools of human DNA, which raises more complex questions of how well the DNA’s interests are served.

The book contains many questionable digressions which seem to be designed to entertain.

He claims Google must have an AGI project in spite of denials by Google’s Peter Norvig (this was before it bought DeepMind). But the evidence he uses to back up this claim is that Google thinks something like AGI would be desirable. The obvious conclusion would be that Google did not then think it had the skill to usefully work on AGI, which would be a sensible position given the history of AGI.

He thinks there’s something paradoxical about Eliezer Yudkowsky wanting to keep some information about himself private while putting lots of personal information on the web. The specific examples Barrat gives strongly suggests that Eliezer doesn’t value the standard notion of privacy, but wants to limit peoples’ ability to distract him. Barrat also says Eliezer “gave up reading for fun several years ago”, which will surprise those who see him frequently mention works of fiction in his Author’s Notes.

All this makes me wonder who the book’s target audience is. It seems to be someone less sophisticated than a person who could write an AGI.

Discussions asking whether “Snowball Earth” triggered animal evolution (see the bottom half of that page) suggest increasing evidence that the Snowball Earth hypothesis may explain an important part of why spacefaring civilizations seem rare.

photosynthetic organisms are limited by nutrients, most often nitrogen or phosphorous

the glaciations led to high phosphorous concentrations, which led to high productivity, which led to high oxygen in the oceans and atmosphere, which allowed for animal evolution to be triggered and thus the rise of the metazoans.

This seems quite speculative, but if true it might mean that our planet needed a snowball earth effect for complex life to evolve, but also needed that snowball earth period to be followed by hundreds of millions of years without another snowball earth period that would wipe out complex life. It’s easy to imagine that the conditions needed to produce one snowball earth effect make it very unusual for the planet to escape repeated snowball earth events for as long as it did, thus explaining more of the Fermi paradox than seemed previously possible.

The most interesting talk at the Singularity Summit 2010 was Shane Legg‘s description of an Algorithmic Intelligence Quotient (AIQ) test that measures something intelligence-like automatically in a way that can test AI programs (or at least the Monte-Carlo AIXI that he uses) on 1000+ environments.

He had a mathematical formula which he thinks rigorously defines intelligence. But he didn’t specify what he meant by the set of possible environments, saying that would be a 50 page paper (he said a good deal of the work on the test had been done last week, so presumably he’s still working on the project). He also included a term that applies Occam’s razor which I didn’t completely understand, but it seems likely that that should have a fairly non-controversial effect.

The environments sound like they imitate individual questions on an IQ test, but with a much wider range of difficulties. We need a more complete description of the set of environments he uses in order to evaluate whether they’re heavily biased toward what Monte-Carlo AIXI does well or whether they closely resemble the environments an AI will find in the real world. He described two reasons for having some confidence in his set of environments: different subsets provided roughly similar results, and a human taking a small subset of the test found some environments easy, some very challenging, and some too hard to understand.

It sounds like with a few more months worth of effort, he could generate a series of results that show a trend in the AIQ of the best AI program in any given year, and also the AIQ of some smart humans (although he implied it would take a long time for a human to complete a test). That would give us some idea of whether AI workers have been making steady progress, and if so when the trend is likely to cross human AIQ levels. An educated guess about when AI will have a major impact on the world should help a bit in preparing for it.

A more disturbing possibility is that this test will be used as a fitness function for genetic programming. Given sufficient computing power, that looks likely to generate superhuman intelligence that is almost certainly unfriendly to humans. I’m confident that sufficient computing power is not available yet, but my confidence will decline over time.

Brian Wang has a few more notes on this talk

The Global Catastrophic Risks conference last Friday was a mix of good and bad talks.
By far the most provocative was Josh‘s talk about “the Weather Machine”. This would consist of small (under 1 cm) balloons made of material a few atoms thick (i.e. needed nanotechnology that won’t be available for a couple of decades) filled with hydrogen and having a mirror in the equatorial plane. They would have enough communications and orientation control to be individually pointed wherever the entity in charge of them wants. They would float 20 miles above the earth’s surface and form a nearly continuous layer surrounding the planet.
This machine would have a few orders of magnitude more power over atmospheric temperatures to compensate for the warming caused by greenhouse gasses this century, although it would only be a partial solution to the waste heat farther in the future that Freitas worries about in his discussion of the global hypsithermal limit.
The military implications make me wish it won’t be possible to make it as powerful as Josh claims. If 10 percent of the mirrors target one location, it would be difficult for anyone in the target area to survive. I suspect defensive mirrors would be of some use, but there would still be serious heating of the atmosphere near the mirrors. Josh claims that it could be designed with a deadman switch that would cause a snowball earth effect if the entity in charge were destroyed, but it’s not obvious why the balloons couldn’t be destroyed in that scenario. Later in the weekend Chris Hibbert raised concerns about how secure it would be against unauthorized people hacking into it, and I wasn’t reassured by Josh’s answer.

James Hughes gave a talk advocating world government. I was disappointed with his inability to imagine that that would result in power becoming too centralized. Nick Bostrom’s discussions of this subject are much more thoughtful.

Alan Goldstein gave a talk about the A-Prize and defining a concept called the carbon barrier to distinguish biological from non-biological life. Josh pointed out that as stated all life fit Goldstein’s definition of biological (since any information can be encoded in DNA). Goldstein modified his definition to avoid that, and then other people mentioned reports such as this which imply that humans don’t fall within Goldstein’s definition of biological due to inheritance of information through means other than DNA. Goldstein seemed unable to understand that objection.

Book review: Global Catastrophic Risks by Nick Bostrom, and Milan Cirkovic.
This is a relatively comprehensive collection of thoughtful essays about the risks of a major catastrophe (mainly those that would kill a billion or more people).
Probably the most important chapter is the one on risks associated with AI, since few people attempting to create an AI seem to understand the possibilities it describes. It makes some implausible claims about the speed with which an AI could take over the world, but the argument they are used to support only requires that a first-mover advantage be important, and that is only weakly dependent on assumptions about that speed with which AI will improve.
The risks of a large fraction of humanity being killed by a super-volcano is apparently higher than the risk from asteroids, but volcanoes have more of a limit on their maximum size, so they appear to pose less risk of human extinction.
The risks of asteroids and comets can’t be handled as well as I thought by early detection, because some dark comets can’t be detected with current technology until it’s way too late. It seems we ought to start thinking about better detection systems, which would probably require large improvements in the cost-effectiveness of space-based telescopes or other sensors.
Many of the volcano and asteroid deaths would be due to crop failures from cold weather. Since mid-ocean temperatures are more stable that land temperatures, ocean based aquaculture would help mitigate this risk.
The climate change chapter seems much more objective and credible than what I’ve previously read on the subject, but is technical enough that it won’t be widely read, and it won’t satisfy anyone who is looking for arguments to justify their favorite policy. The best part is a list of possible instabilities which appear unlikely but which aren’t understood well enough to evaluate with any confidence.
The chapter on plagues mentions one surprising risk – better sanitation made polio more dangerous by altering the age at which it infected people. If I’d written the chapter, I’d have mentioned Ewald’s analysis of how human behavior influences the evolution of strains which are more or less virulent.
There’s good news about nuclear proliferation which has been under-reported – a fair number of countries have abandoned nuclear weapons programs, and a few have given up nuclear weapons. So if there’s any trend, it’s toward fewer countries trying to build them, and a stable number of countries possessing them. The bad news is we don’t know whether nanotechnology will change that by drastically reducing the effort needed to build them.
The chapter on totalitarianism discusses some uncomfortable tradeoffs between the benefits of some sort of world government and the harm that such government might cause. One interesting claim:

totalitarian regimes are less likely to foresee disasters, but are in some ways better-equipped to deal with disasters that they take seriously.

This post is a response to a challenge on Overcoming Bias to spend $10 trillion sensibly.
Here’s my proposed allocation (spending to be spread out over 10-20 years):

  • $5 trillion on drug patent buyouts and prizes for new drugs put in the public domain, with the prizes mostly allocated in proportion to the quality adjusted life years attributable to the drug.
  • $1 trillion on establishing a few dozen separate clusters of seasteads and on facilitating migration of people from poor/oppressive countries by rewarding jurisdictions in proportion to the number of immigrants they accept from poorer / less free regions. (I’m guessing that most of those rewards will go to seasteads, many of which will be created by other people partly in hopes of getting some of these rewards).

    This would also have a side affect of significantly reducing the harm that humans might experience due to global warming or an ice age, since ocean climates have less extreme temperatures, seasteads will probably not depend on rainfall to grow food, and can move somewhat to locations with better temperatures.
  • $1 trillion on improving political systems, mostly through prizes that bear some resemblance to The Mo Ibrahim Prize for Achievement in African Leadership (but not limited to democratically elected leaders and not limited to Africa). If the top 100 or so politicians in about 100 countries are eligible, I could set the average reward at about $100 million per person. Of course, nowhere near all of them will qualify, so a fair amount will be left over for those not yet in office.
  • $0.5 trillion on subsidizing trading on prediction markets that are designed to enable futarchy. This level of subsidy is far enough from anything that has been tried that there’s no way to guess whether this is a wasteful level.
  • $1 trillion existential risks
    Some unknown fraction of this would go to persuading people not to work on AGI without providing arguments that they will produce a safe goal system for any AI they create. Once I’m satisfied that the risks associated with AI are under control, much of the remaining money will go toward establishing societies in the asteroid belt and then outside the solar system.
  • $0.5 trillion on communications / computing hardware for everyone who can’t currently afford that.
  • $1 trillion I’d save for ideas I think of later.

I’m not counting a bunch of other projects that would use up less than $100 billion since they’re small enough to fit in the rounding errors of the ones I’ve counted (the Methuselah Mouse prize, desalinization and other water purification technologies, developing nanotech, preparing for the risks of nanotech, uploading, cryonics, nature preserves, etc).

Steve Omohundro has recently written a paper and given a talk (a video should become available soon) on AI ethics with arguments whose most important concerns resemble Eliezer Yudkowsky’s. I find Steve’s style more organized and more likely to convince mainstream researchers than Eliezer’s best attempt so far.
Steve avoids Eliezer’s suspicious claims about how fast AI will take off, and phrases his arguments in ways that are largely independent of the takeoff speed. But a sentence or two in the conclusion of his paper suggests that he is leaning toward solutions which assume multiple AIs will be able to safeguard against a single AI imposing its goals on the world. He doesn’t appear to have a good reason to consider this assumption reliable, but at least he doesn’t show the kind of disturbing certainty that Eliezer has about the first self-improving AI becoming powerful enough to take over the world.
Possibly the most important news in Steve’s talk was his statement that he had largely stopped working to create intelligent software due to his concerns about safely specifying goals for an AI. He indicated that one important insight that contributed to this change of mind came when Carl Shulman pointed out a flaw in Steve’s proposal for a utility function which included a goal of the AI shutting itself off after a specified time (the flaw involves a small chance of physics being different from apparent physics and how the AI will evaluate expected utilities resulting from that improbable physics).

Tim Freeman has a paper which clarifies many of the issues that need to be solved for humans to coexist with a superhuman AI. It comes close to what we would need if we had unlimited computing power. I will try amplify on some of the criticisms of it from the sl4 mailing list.
It errs on the side of our current intuitions about what I consider to be subgoals, rather than trusting the AI’s reasoning to find good subgoals to meet primary human goal(s). Another way to phrase that would be that it fiddles with parameters to get special-case results that fit our intuitions rather than focusing on general purpose solutions that would be more likely to produce good results in conditions that we haven’t yet imagined.
For example, concern about whether the AI pays the grocer seems misplaced. If our current intuitions about property rights continue to be good guidelines for maximizing human utility in a world with a powerful AI, why would that AI not reach that conclusion by inferring human utility functions from observed behavior and modeling the effects of property rights on human utility? If not, then why shouldn’t we accept that the AI has decided on something better than property rights (assuming our other methods of verifying that the AI is optimizing human utility show no flaws)?
Is it because we lack decent methods of verifying the AI’s effects on phenomena such as happiness that are more directly related to our utility functions? If so, it would seem to imply that we have an inadequate understanding of what we mean by maximizing utility. I didn’t see a clear explanation of how the AI would infer utility functions from observing human behavior (maybe the source code, which I haven’t read, clarifies it), but that appears to be roughly how humans at their best make the equivalent moral judgments.
I see similar problems with designing the AI to produce the “correct” result with Pascal’s Wager. Tim says “If Heaven and Hell enter into a decision about buying apples, the outcome seems difficult to predict”. Since humans have a poor track record at thinking rationally about very small probabilities and phenomena such as Heaven that are hard to observe, I wouldn’t expect AI unpredictability in this area to be evidence of a problem. It seems more likely that humans are evaluating Pascal’s Wager incorrectly than that a rational AI which can infer most aspects of human utility functions from human behavior will evaluate it incorrectly.

Book review: Beyond AI: Creating the Conscience of the Machine by J. Storrs Hall
The first two thirds of this book survey current knowledge of AI and make some guesses about when and how it will take off. This part is more eloquent than most books on similar subjects, and its somewhat different from normal perspective makes it worth reading if you are reading several books on the subject. But ease of reading is the only criterion by which this section stands out as better than competing books.
The last five chapters that are surprisingly good, and should shame most professional philosophers whose writings by comparison are a waste of time.
His chapter on consciousness, qualia, and related issues is more concise and persuasive than anything else I’ve read on these subjects. It’s unlikely to change the opinions of people who have already thought about these subjects, but it’s an excellent place for people who are unfamiliar with them to start.
His discussions of ethics using game theory and evolutionary pressures is an excellent way to frame ethical discussions.
My biggest disappointment was that he starts to recognize a possibly important risk of AI when he says “disparities among the abilities of AIs … could negate the evolutionary pressure to reciprocal altruism”, but then seems to dismiss that thoughtlessly (“The notion of one single AI taking off and obtaining hegemony over the whole world by its own efforts is ludicrous”).
He probably has semi-plausible grounds for dismissing some of the scenarios of this nature that have been proposed (e.g. the speed at which some people imagine an AI would take off is improbable). But if AIs with sufficiently general purpose intelligence enhance their intelligence at disparate rates for long enough, the results would render most of the book’s discussion of ethics irrelevant. The time it took humans to accumulate knowledge didn’t give Neanderthals much opportunity to adapt. Would the result have been different if Neanderthals had learned to trade with humans? The answer is not obvious, and probably depends on Neanderthal learning abilities in ways that I don’t know how to analyze.
Also, his arguments for optimism aren’t quite as strong as he thinks. His point that career criminals are generally of low intelligence is reassuring if the number of criminals is all that matters. But when the harm done by one relatively smart criminal can be very large (e.g. Mao), it’s hard to say that the number of criminals is all that matters.
Here’s a nice quote from Mencken which this book quotes part of:

Moral certainty is always a sign of cultural inferiority. The more uncivilized the man, the surer he is that he knows precisely what is right and what is wrong. All human progress, even in morals, has been the work of men who have doubted the current moral values, not of men who have whooped them up and tried to enforce them. The truly civilized man is always skeptical and tolerant, in this field as in all others. His culture is based on ‘I am not too sure.’

Another interesting tidbit is the anecdote that H.G. Wells predicted in 1907 that flying machines would be built. In spite of knowing a lot about attempts to build them, he wasn’t aware that the Wright brothers had succeeded in 1903.
If an AI started running in 2003 that has accumulated the knowledge of a 4-year old human and has the ability to continue learning at human or faster speeds, would we have noticed? Or would the reports we see about it sound too much like the reports of failed AIs for us to pay attention?